[BioC] creating contrast matrix (limma) for two factorial in R
Ryan M Ghan
rghan at unr.edu
Fri Mar 14 01:16:28 CET 2014
Hi Ryan-
Thank you very much for your comments and code contributions. Applying
your suggestions seems to produce the results that I was looking for. I
think the hardest thing has been to conceptualize the specific tests that
I was after. As you mentioned, testing genotype differences for one
condition makes more sense than what I had previously attempted. Likewise,
testing treatment differences within one genotype at a time. Now I¹ll need
to spend some additional time with the data beyond a quick overview.
Regarding other count specific packages, I was aware of both DESeq2 and
limma¹s voom, as alternatives to deal with spectral count data. I am
actually in the process of utilizing the DESeq2 package for my RNAseq
datasets, and I have thought about applying them towards my proteomic work
once I am comfortable with it. My initial hesitation has been the criteria
I have used for normalizing the spectral counts (Normalized Spectral
Abundance Factor), which accounts for sample-to-sample variation of
replicates and the fact that longer proteins can theoretically produce
more peptide/spectral hits than shorter proteins. This criteria includes
excluding any samples from quantification that have missing counts within
>=1 biological replicate, and enforcing a minimum threshold of 5-counts
>across all biological replicates for a protein to be included for
>downstream analysis. I will have to try one of the packages out with the
>same filtered dataset that exludes any of the aforementioned Œmissing¹
>protein counts.
Thanks again!
Ryan
Ryan M. Ghan
Graduate Research Assistant
Department of Biochemistry and Molecular Biology, MS 330
University of Nevada, Reno
Reno, NV 89557
Howard 205
Lab: (775) 784-4225
rghan at unr.edu
On 3/13/14, 4:31 PM, "Ryan" <rct at thompsonclan.org> wrote:
>Hi Ryan,
>
>Your design matrix is fine. However, your mistake is trying to stuff
>each one of your tests into a single contrast. As an example, for the
>interaction test, you are testing the interaction between a 5-level
>factor and a 2-level factor, so the test has 4 degrees of freedom. This
>means that you'll need 4 contrasts to represent this test:
>
>interaction.contrast.exprs <- sprintf("(g%s.s-g%s.c) - (g1.s-g1.c)",
>2:5, 2:5)
>interaction.contrast.matrix <-
>makeContrasts(contrasts=interaction.contrast.exprs, levels=design)
>
>The resulting contrast matrix should have 4 columns, one for each
>contrast in your test.
>
>For your other two tests, you should be aware that the expression
>"(g1.s + g1.c)/2" means "an equal mix of control and stressed", which
>is probably not what you want. I would say that it makes more sense to
>just compare the genotypes within a single condition. To test for
>genotype differences in the control condition, this is again a
>4-degree-of-freedom test, so you'll need a similar approach to the
>above:
>
>control.genotype.contrast.exprs <- sprintf("g%s.c-g1.c", 2:5, 2:5)
>control.genotype.contrast.matrix <-
>makeContrasts(contrasts=control.genotype.contrast.exprs, levels=design)
>
>You could also to the same with the stressed condition as a separate
>test. The same caveat applies to your treatment contrast, which is only
>correct if you want to know the expression differences for control vs
>stressed in an equal mix of all 5 genotypes (i.e. a hypothetical
>population with 20% of each genotype). Again, the best approach is
>probably to test control vs stressed in each genotype separately, i.e.
>5 tests of one contrast each.
>
>Lastly, since you said you are working with spectral count data, you
>might want to consider using the edgeR or DESeq2 packages, which are
>specifically designed for analyzing count data, or else to use limma's
>voom function to perform the log2-transform while accounting for
>heteroskedasticity.
>
>Anyway, that's my understanding. I hope it helps you. Anyone else can
>feel free to correct me if I've made any errors in my explanation.
>
>-Ryan Thompson
>
>On Thu Mar 13 15:14:20 2014, Ryan M Ghan wrote:
>> Hello limma community-
>>
>> I am attempting to construct a contrast matrix that I can run in R,
>>using the limma bioconductor package, but I am not sure that I have
>>coded the contrast matrix correctly. A previous
>>post<https://stats.stackexchange.com/questions/64249/creating-contrast-ma
>>trix-for-linear-regression-in-r?newreg=add2674ca9d04b7eb85fad255b45b7f5>
>>on stats.stackexchange.com, the bioconductor mailing list, and the limma
>>guide were helpful, but my two factorial design is more complicated than
>>what is illustrated there.
>>
>> The first factor is the treatment, with two levels (control=c and
>>stress=s), and the second factor is the genotype, with five levels (g1,
>>g2, g3, g4, g5). Each genotype/treatment consists of 3-biological
>>replicates (30xsamples total). My dataset has already been normalized
>>and log2 transformed (Should I even start with log transformed data?).
>>It consists of 1208 proteins (based upon spectral counting for those
>>that care) that measures protein abundance differences in the five
>>genotypes and two treatments. The dataset is complete, meaning each
>>sample/condition has a datapoint, and appears to be normally distributed
>>(histogram and qqplots).
>>
>> ## Session information
>>> sessionInfo()
>> R version 3.0.2 (2013-09-25)
>> Platform: x86_64-apple-darwin10.8.0 (64-bit)
>>
>> locale:
>> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
>>
>> attached base packages:
>> [1] stats graphics grDevices utils datasets methods base
>>
>> other attached packages:
>> [1] limma_3.18.13
>>
>> loaded via a namespace (and not attached):
>> [1] colorspace_1.2-4 dichromat_2.0-0 digest_0.6.4
>>ggplot2_0.9.3.1 grid_3.0.2 gtable_0.1.2
>> [7] labeling_0.2 MASS_7.3-30 munsell_0.4.2
>>plyr_1.8.1 proto_0.3-10 RColorBrewer_1.0-5
>> [13] Rcpp_0.11.0 reshape2_1.2.2 scales_0.2.3
>>stringr_0.6.2 tools_3.0.2
>>
>> ## Subset of the data:
>> proteinID g1.s1 g1.s2 g1.s3 g1.c1 g1.c2 g1.c3 g2.s1 g2.s2 g2.s3
>>g2.c1 g2.c2 g2.c3 g3.s1 g3.s2 g3.s3 g3.c1 g3.c2 g3.c3 g4.s1 g4.s2 g4.s3
>>g4.c1 g4.c2 g4.c3 g5.s1 g5.s2 g5.s3 g5.c1 g5.c2 g5.c3
>> prot1 -9.70583694 -9.940059478 -9.764489183 -9.691937821
>>-9.547306096 -9.668928704 -9.821333234 -10.00376839 -9.843380585
>>-10.0789111 -9.958506961 -9.791583706 -10.04996359 -10.10279896
>>-10.0689715 -9.989303332 -10.05414639 -10.00619809 -9.907032795
>>-10.09700113 -10.00902876 -10.05603575 -10.26218387 -10.15527373
>>-9.88009858 -9.748974338 -9.730010667 -9.899956956 -9.773955101
>>-9.957684691
>> prot2 -9.810354967 -9.844319231 -9.896748977 -9.777040294
>>-9.821308434 -9.906798728 -9.832236541 -9.876359355 -9.935535795
>>-10.05991278 -9.831098077 -9.789738587 -10.08470861 -10.18515166
>>-10.10371621 -10.01971224 -9.977142493 -10.09055782 -9.739831978
>>-9.586647999 -9.949407778 -9.800183583 -9.83900565 -9.943521592
>>-9.99229056 -9.744850134 -9.794814509 -9.98542989 -9.766324886
>>-9.95430439
>> prot3 -11.70842601 -11.72521838 -11.90389475 -11.98273998
>>-11.915401 -11.88620205 -11.91603643 -11.96029519 -12.14926486
>>-12.23846499 -12.26650985 -11.84300821 -12.64562082 -12.41471031
>>-12.66462278 -12.577619 -12.90001898 -12.31577711 -11.66323243
>>-11.50283992 -11.4844068 -11.60402491 -11.95270942 -11.68245512
>>-12.32380181 -12.24294758 -12.23990879 -12.21563403 -12.33730369
>>-12.437377
>> prot4 -10.88942769 -11.16906693 -11.13942576 -11.31332257
>>-11.04718433 -11.11811122 -11.17687812 -11.12503828 -10.9724186
>>-11.16837945 -11.19642214 -10.96468249 -11.3975887 -11.28808753
>>-11.32778647 -11.34124725 -11.30972182 -11.29564372 -10.74370929
>>-10.92223539 -10.97733154 -11.40528844 -11.1238659 -11.15938598
>>-11.24937805 -10.8691392 -11.12478375 -10.75566728 -10.99485703
>>-11.09493115
>> prot5 -10.0102959 -9.936796529 -9.964629149 -9.842835973
>>-9.791578592 -9.773380518 -9.72290866 -9.715837804 -9.79028651
>>-9.951486129 -9.636225505 -9.820715987 -10.41899204 -10.25269382
>>-10.26949484 -10.02644184 -10.13120897 -10.20756299 -9.752087376
>>-9.687001368 -10.07111473 -9.815279198 -9.995624174 -9.993526894
>>-9.722360141 -9.551502595 -9.551929198 -9.724500546 -9.502769792
>>-9.65324573
>> prot6 -10.34051005 -10.27571947 -10.14968761 -10.17419023
>>-10.47812301 -10.11019796 -10.40447672 -10.15885481 -10.22900798
>>-10.26612428 -10.21920493 -10.17186677 -10.66125689 -10.95438025
>>-10.63751536 -10.65825783 -10.60857688 -10.78516027 -10.33890785
>>-10.49726978 -10.47100414 -10.64742463 -10.78932619 -10.5318634
>>-10.26494688 -9.975182247 -10.24870036 -10.2356165 -10.26689552
>>-10.13061368
>> prot7 -10.24930429 -10.37307132 -10.03573128 -10.29985129
>>-9.991216794 -10.05854902 -10.1958704 -10.30549818 -10.2078462
>>-10.28795766 -10.23314344 -10.23897922 -9.997472306 -10.27461285
>>-10.20805608 -10.06261332 -10.24876706 -10.12643737 -9.906088449
>>-10.07316322 -10.23545822 -10.30970717 -10.40745591 -10.36432166
>>-10.22423532 -10.25703553 -10.44925268 -9.902554721 -9.891163766
>>-10.0695915
>> prot8 -10.98782595 -10.84184533 -10.76496107 -10.68290092
>>-10.55763113 -10.91736394 -10.87505278 -10.76474268 -10.58319007
>>-10.87547281 -10.71948079 -10.95011831 -10.99753277 -11.061728
>>-10.8852958 -10.86371208 -10.96638746 -11.24112703 -10.46809937
>>-10.78446288 -10.71240489 -10.80931259 -10.6598091 -10.54801115
>>-10.70612733 -10.7339808 -10.8184854 -10.53370359 -10.47323989
>>-10.62675183
>> prot9 -8.83857166 -8.736344638 -8.743339515 -8.8152675
>>-8.743086044 -8.719612156 -8.898093257 -8.902781886 -9.071574958
>>-8.945970659 -8.862394746 -8.825061244 -8.82313363 -9.161452294
>>-8.905846232 -8.940119002 -9.024995852 -8.943721201 -8.768488159
>>-8.802155458 -8.721187011 -8.84850416 -8.931513624 -8.86743278
>>-8.856904592 -8.675257846 -8.900833162 -8.676117406 -8.758661701
>>-8.925717389
>> prot10 -10.65297508 -10.74532307 -10.65940071 -10.36671791
>>-10.50431649 -10.54915637 -11.07154003 -10.79884265 -10.97164196
>>-11.1201714 -11.14821342 -10.9254445 -10.92875918 -10.90806369
>>-10.77581175 -11.2324716 -11.31360896 -11.01070959 -11.04450945
>>-10.89694291 -10.76865867 -10.92983387 -11.07365287 -11.43888216
>>-11.14948441 -10.69611194 -10.85827316 -10.64470128 -10.79046792
>>-10.86048168
>>
>> ## Code that I am attempting to utilize:
>> proteins.mat <- as.matrix(proteins.df)
>> treat =
>>c("g1.s","g1.c","g2.s","g2.c","g3.s","g3.c","g4.s","g4.c","g5.s","g5.c")
>> factors = gl(10,3,labels=treat)
>> design <- model.matrix(~0+factors)
>> colnames(design) <- treat
>>
>> ## Here is the design for my model:
>> > design
>> g1.s g1.c g2.s g2.c g3.s g3.c g4.s g4.c g5.s g5.c
>> 1 1 0 0 0 0 0 0 0 0 0
>> 2 1 0 0 0 0 0 0 0 0 0
>> 3 1 0 0 0 0 0 0 0 0 0
>> 4 0 1 0 0 0 0 0 0 0 0
>> 5 0 1 0 0 0 0 0 0 0 0
>> 6 0 1 0 0 0 0 0 0 0 0
>> 7 0 0 1 0 0 0 0 0 0 0
>> 8 0 0 1 0 0 0 0 0 0 0
>> 9 0 0 1 0 0 0 0 0 0 0
>> 10 0 0 0 1 0 0 0 0 0 0
>> 11 0 0 0 1 0 0 0 0 0 0
>> 12 0 0 0 1 0 0 0 0 0 0
>> 13 0 0 0 0 1 0 0 0 0 0
>> 14 0 0 0 0 1 0 0 0 0 0
>> 15 0 0 0 0 1 0 0 0 0 0
>> 16 0 0 0 0 0 1 0 0 0 0
>> 17 0 0 0 0 0 1 0 0 0 0
>> 18 0 0 0 0 0 1 0 0 0 0
>> 19 0 0 0 0 0 0 1 0 0 0
>> 20 0 0 0 0 0 0 1 0 0 0
>> 21 0 0 0 0 0 0 1 0 0 0
>> 22 0 0 0 0 0 0 0 1 0 0
>> 23 0 0 0 0 0 0 0 1 0 0
>> 24 0 0 0 0 0 0 0 1 0 0
>> 25 0 0 0 0 0 0 0 0 1 0
>> 26 0 0 0 0 0 0 0 0 1 0
>> 27 0 0 0 0 0 0 0 0 1 0
>> 28 0 0 0 0 0 0 0 0 0 1
>> 29 0 0 0 0 0 0 0 0 0 1
>> 30 0 0 0 0 0 0 0 0 0 1
>> attr(,"assign")
>> [1] 1 1 1 1 1 1 1 1 1 1
>> attr(,"contrasts")
>> attr(,"contrasts")$factors
>> [1] "contr.treatment"
>>
>> ## My contrast model. I want to test for interaction, differences
>>between genotypes, and to see if specific genotypes respond differently
>>to the treatment from one another:
>> cmtx <- makeContrasts(
>>
>>GenotypevsTreatment=(g1.s-g1.c)-(g2.s-g2.c)-(g3.s-g3.c)-(g4.s-g4.c)-(g5.s
>>-g5.c),
>>
>>genotype=(g1.s+g1.c)-(g2.s+g2.c)-(g3.s+g3.c)-(g4.s+g4.c)-(g5.s+g5.c),
>> Treatment=(g1.s+g2.s+g3.s+g4.s+g5.s)-(g1.c+g2.c+g3.c+g4.c+g5.c),
>> levels=design)
>>
>> ## What my contrast model looks like, but I don't think this is correct:
>> > cmtx
>> Contrasts
>> Levels GenotypevsTreatment Genotype Treatment
>> g1.s 1 1 1
>> g1.c -1 1 -1
>> g2.s -1 -1 1
>> g2.c 1 -1 -1
>> g3.s -1 -1 1
>> g3.c 1 -1 -1
>> g4.s -1 -1 1
>> g4.c 1 -1 -1
>> g5.s -1 -1 1
>> g5.c 1 -1 -1
>>
>> ## Fitting the linear model by empirical bayes statistics for
>>differential expression:
>> fit <- eBayes(contrasts.fit(lmFit(proteins.mat, design), cmtx))
>> topTable(fit, adjust.method="BH")
>>
>> ## The below topTable proteins are the same as the subset of data from
>>above:
>> > topTable(fit, adjust.method="BH")
>> GenotypevsTreatment Genotype Treatment AveExpr
>>F P.Value adj.P.Val
>> prot1 -0.40786338 60.30918 0.073054723 -9.918822 17308.55
>>1.124646e-39 1.232079e-36
>> prot2 -0.09255219 59.60864 0.061701713 -9.897968 15801.43
>>3.304533e-39 1.232079e-36
>> prot3 -0.23880357 73.48557 0.536672827 -12.090016 15650.65
>>3.701463e-39 1.232079e-36
>> prot4 -0.11834000 66.76931 0.305471823 -11.122034 15522.46
>>4.079731e-39 1.232079e-36
>> prot5 -0.15210172 59.21509 -0.183849274 -9.876144 14734.51
>>7.556112e-39 1.423908e-36
>> prot6 -0.15761118 62.87467 0.155340561 -10.389362 14565.87
>>8.658504e-39 1.423908e-36
>> prot7 -0.03886438 61.15652 -0.166795475 -10.182834 14551.88
>>8.757515e-39 1.423908e-36
>> prot8 -0.10425341 64.63523 -0.186904167 -10.780359 14461.18
>>9.429854e-39 1.423908e-36
>> prot9 -0.03426380 53.48057 0.007403722 -8.854471 13713.49
>>1.767090e-38 2.021378e-36
>> prot10 -0.75250251 66.62646 0.327497120 -10.894506 13480.51
>>2.164184e-38 2.021378e-36
>>
>> I think I am naively constructing the contrast matrix. The result for
>>Genotype (above) looks incorrect to me. Ideally, I would like to figure
>>this out because I wish to apply similar modeling techniques to an
>>RNAseq dataset from the same experimental samples that I used for my
>>protein work.
>>
>> Also, in order to make the correct comparisons, do I also need to
>>employ the ŒduplicateCorrelation¹ function (pg. 50, lima guide 9.7
>>Multi-level Experiments)? Something like this:
>>
>> corfit <- duplicateCorrelation(proteins.mat, design)
>>> corfit$consensus
>> [1] -0.7888584
>> fit <- eBayes(contrasts.fit(lmFit(proteins.mat, design,
>>corrleation=corfit$consensus), cmtx))
>>
>> Any insight/corrections would be greatly appreciated, and if I have
>>forgotten some information that would aid in helping me with this
>>problem I am happy to cooperate.
>>
>> Cheers,
>> Ryan
>>
>> Ryan M. Ghan
>> Graduate Research Assistant
>> Department of Biochemistry and Molecular Biology, MS 330
>> University of Nevada, Reno
>> Reno, NV 89557
>> Howard 205
>> Lab: (775) 784-4225
>> rghan at unr.edu
>>
>>
>>
>> [[alternative HTML version deleted]]
>>
>>
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>>http://news.gmane.org/gmane.science.biology.informatics.conductor
More information about the Bioconductor
mailing list