[BioC] creating contrast matrix (limma) for two factorial in R

Fri Mar 14 01:16:28 CET 2014

Hi Ryan-

Thank you very much for your comments and code contributions. Applying
your suggestions seems to produce the results that I was looking for. I
think the hardest thing has been to conceptualize the specific tests that
I was after. As you mentioned, testing genotype differences for one
condition makes more sense than what I had previously attempted. Likewise,
testing treatment differences within one genotype at a time. Now I¹ll need
to spend some additional time with the data beyond a quick overview.

Regarding other count specific packages, I was aware of both DESeq2 and
limma¹s voom, as alternatives to deal with spectral count data. I am
actually in the process of utilizing the DESeq2 package for my RNAseq
datasets, and I have thought about applying them towards my proteomic work
once I am comfortable with it. My initial hesitation has been the criteria
I have used for normalizing the spectral counts (Normalized Spectral
Abundance Factor), which accounts for sample-to-sample variation of
replicates and the fact that longer proteins can theoretically produce
more peptide/spectral hits than shorter proteins. This criteria includes
excluding any samples from quantification that have missing counts within
>=1 biological replicate, and enforcing a minimum threshold of 5-counts
>across all biological replicates for a protein to be included for
>downstream analysis. I will have to try one of the packages out with the
>same filtered dataset that exludes any of the aforementioned Œmissing¹
>protein counts.

Thanks again!
Ryan 

Ryan M. Ghan
Graduate Research Assistant
Department of Biochemistry and Molecular Biology, MS 330
University of Nevada, Reno
Reno, NV 89557
Howard 205
Lab: (775) 784-4225
rghan at unr.edu

On 3/13/14, 4:31 PM, "Ryan" <rct at thompsonclan.org> wrote:

>Hi Ryan,
>
>Your design matrix is fine. However, your mistake is trying to stuff
>each one of your tests into a single contrast. As an example, for the
>interaction test, you are testing the interaction between a 5-level
>factor and a 2-level factor, so the test has 4 degrees of freedom. This
>means that you'll need 4 contrasts to represent this test:
>
>interaction.contrast.exprs <- sprintf("(g%s.s-g%s.c) - (g1.s-g1.c)",
>2:5, 2:5)
>interaction.contrast.matrix <-
>makeContrasts(contrasts=interaction.contrast.exprs, levels=design)
>
>The resulting contrast matrix should have 4 columns, one for each
>contrast in your test.
>
>For your other two tests, you should be aware that the expression
>"(g1.s + g1.c)/2" means "an equal mix of control and stressed", which
>is probably not what you want. I would say that it makes more sense to
>just compare the genotypes within a single condition. To test for
>genotype differences in the control condition, this is again a
>4-degree-of-freedom test, so you'll need a similar approach to the
>above:
>
>control.genotype.contrast.exprs <- sprintf("g%s.c-g1.c", 2:5, 2:5)
>control.genotype.contrast.matrix <-
>makeContrasts(contrasts=control.genotype.contrast.exprs, levels=design)
>
>You could also to the same with the stressed condition as a separate
>test. The same caveat applies to your treatment contrast, which is only
>correct if you want to know the expression differences for control vs
>stressed in an equal mix of all 5 genotypes (i.e. a hypothetical
>population with 20% of each genotype). Again, the best approach is
>probably to test control vs stressed in each genotype separately, i.e.
>5 tests of one contrast each.
>
>Lastly, since you said you are working with spectral count data, you
>might want to consider using the edgeR or DESeq2 packages, which are
>specifically designed for analyzing count data, or else to use limma's
>voom function to perform the log2-transform while accounting for
>heteroskedasticity.
>
>Anyway, that's my understanding. I hope it helps you. Anyone else can
>feel free to correct me if I've made any errors in my explanation.
>
>-Ryan Thompson
>
>On Thu Mar 13 15:14:20 2014, Ryan M Ghan wrote:
>> Hello limma community-
>>
>> I am attempting to construct a contrast matrix that I can run in R,
>>using the limma bioconductor package, but I am not sure that I have
>>coded the contrast matrix correctly. A previous
>>post<https://stats.stackexchange.com/questions/64249/creating-contrast-ma
>>trix-for-linear-regression-in-r?newreg=add2674ca9d04b7eb85fad255b45b7f5>
>>on stats.stackexchange.com, the bioconductor mailing list, and the limma
>>guide were helpful, but my two factorial design is more complicated than
>>what is illustrated there.
>>
>> The first factor is the treatment, with two levels (control=c and
>>stress=s), and the second factor is the genotype, with five levels (g1,
>>g2, g3, g4, g5). Each genotype/treatment consists of 3-biological
>>replicates (30xsamples total). My dataset has already been normalized
>>and log2 transformed (Should I even start with log transformed data?).
>>It consists of 1208 proteins (based upon spectral counting for those
>>that care) that measures protein abundance differences in the five
>>genotypes and two treatments. The dataset is complete, meaning each
>>sample/condition has a datapoint, and appears to be normally distributed
>>(histogram and qqplots).
>>
>> ## Session information
>>> sessionInfo()
>> R version 3.0.2 (2013-09-25)
>> Platform: x86_64-apple-darwin10.8.0 (64-bit)
>>
>> locale:
>> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
>>
>> attached base packages:
>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>
>> other attached packages:
>> [1] limma_3.18.13
>>
>> loaded via a namespace (and not attached):
>>   [1] colorspace_1.2-4   dichromat_2.0-0    digest_0.6.4
>>ggplot2_0.9.3.1    grid_3.0.2         gtable_0.1.2
>>   [7] labeling_0.2       MASS_7.3-30        munsell_0.4.2
>>plyr_1.8.1         proto_0.3-10       RColorBrewer_1.0-5
>> [13] Rcpp_0.11.0        reshape2_1.2.2     scales_0.2.3
>>stringr_0.6.2      tools_3.0.2
>>
>> ## Subset of the data:
>>      proteinID g1.s1 g1.s2 g1.s3 g1.c1 g1.c2 g1.c3 g2.s1 g2.s2 g2.s3
>>g2.c1 g2.c2 g2.c3 g3.s1 g3.s2 g3.s3 g3.c1 g3.c2 g3.c3 g4.s1 g4.s2 g4.s3
>>g4.c1 g4.c2 g4.c3 g5.s1 g5.s2 g5.s3 g5.c1 g5.c2 g5.c3
>>      prot1 -9.70583694 -9.940059478 -9.764489183 -9.691937821
>>-9.547306096 -9.668928704 -9.821333234 -10.00376839 -9.843380585
>>-10.0789111 -9.958506961 -9.791583706 -10.04996359 -10.10279896
>>-10.0689715 -9.989303332 -10.05414639 -10.00619809 -9.907032795
>>-10.09700113 -10.00902876 -10.05603575 -10.26218387 -10.15527373
>>-9.88009858 -9.748974338 -9.730010667 -9.899956956 -9.773955101
>>-9.957684691
>>      prot2 -9.810354967 -9.844319231 -9.896748977 -9.777040294
>>-9.821308434 -9.906798728 -9.832236541 -9.876359355 -9.935535795
>>-10.05991278 -9.831098077 -9.789738587 -10.08470861 -10.18515166
>>-10.10371621 -10.01971224 -9.977142493 -10.09055782 -9.739831978
>>-9.586647999 -9.949407778 -9.800183583 -9.83900565 -9.943521592
>>-9.99229056 -9.744850134 -9.794814509 -9.98542989 -9.766324886
>>-9.95430439
>>      prot3 -11.70842601 -11.72521838 -11.90389475 -11.98273998
>>-11.915401 -11.88620205 -11.91603643 -11.96029519 -12.14926486
>>-12.23846499 -12.26650985 -11.84300821 -12.64562082 -12.41471031
>>-12.66462278 -12.577619 -12.90001898 -12.31577711 -11.66323243
>>-11.50283992 -11.4844068 -11.60402491 -11.95270942 -11.68245512
>>-12.32380181 -12.24294758 -12.23990879 -12.21563403 -12.33730369
>>-12.437377
>>      prot4 -10.88942769 -11.16906693 -11.13942576 -11.31332257
>>-11.04718433 -11.11811122 -11.17687812 -11.12503828 -10.9724186
>>-11.16837945 -11.19642214 -10.96468249 -11.3975887 -11.28808753
>>-11.32778647 -11.34124725 -11.30972182 -11.29564372 -10.74370929
>>-10.92223539 -10.97733154 -11.40528844 -11.1238659 -11.15938598
>>-11.24937805 -10.8691392 -11.12478375 -10.75566728 -10.99485703
>>-11.09493115
>>      prot5 -10.0102959 -9.936796529 -9.964629149 -9.842835973
>>-9.791578592 -9.773380518 -9.72290866 -9.715837804 -9.79028651
>>-9.951486129 -9.636225505 -9.820715987 -10.41899204 -10.25269382
>>-10.26949484 -10.02644184 -10.13120897 -10.20756299 -9.752087376
>>-9.687001368 -10.07111473 -9.815279198 -9.995624174 -9.993526894
>>-9.722360141 -9.551502595 -9.551929198 -9.724500546 -9.502769792
>>-9.65324573
>>      prot6 -10.34051005 -10.27571947 -10.14968761 -10.17419023
>>-10.47812301 -10.11019796 -10.40447672 -10.15885481 -10.22900798
>>-10.26612428 -10.21920493 -10.17186677 -10.66125689 -10.95438025
>>-10.63751536 -10.65825783 -10.60857688 -10.78516027 -10.33890785
>>-10.49726978 -10.47100414 -10.64742463 -10.78932619 -10.5318634
>>-10.26494688 -9.975182247 -10.24870036 -10.2356165 -10.26689552
>>-10.13061368
>>      prot7 -10.24930429 -10.37307132 -10.03573128 -10.29985129
>>-9.991216794 -10.05854902 -10.1958704 -10.30549818 -10.2078462
>>-10.28795766 -10.23314344 -10.23897922 -9.997472306 -10.27461285
>>-10.20805608 -10.06261332 -10.24876706 -10.12643737 -9.906088449
>>-10.07316322 -10.23545822 -10.30970717 -10.40745591 -10.36432166
>>-10.22423532 -10.25703553 -10.44925268 -9.902554721 -9.891163766
>>-10.0695915
>>      prot8 -10.98782595 -10.84184533 -10.76496107 -10.68290092
>>-10.55763113 -10.91736394 -10.87505278 -10.76474268 -10.58319007
>>-10.87547281 -10.71948079 -10.95011831 -10.99753277 -11.061728
>>-10.8852958 -10.86371208 -10.96638746 -11.24112703 -10.46809937
>>-10.78446288 -10.71240489 -10.80931259 -10.6598091 -10.54801115
>>-10.70612733 -10.7339808 -10.8184854 -10.53370359 -10.47323989
>>-10.62675183
>>      prot9 -8.83857166 -8.736344638 -8.743339515 -8.8152675
>>-8.743086044 -8.719612156 -8.898093257 -8.902781886 -9.071574958
>>-8.945970659 -8.862394746 -8.825061244 -8.82313363 -9.161452294
>>-8.905846232 -8.940119002 -9.024995852 -8.943721201 -8.768488159
>>-8.802155458 -8.721187011 -8.84850416 -8.931513624 -8.86743278
>>-8.856904592 -8.675257846 -8.900833162 -8.676117406 -8.758661701
>>-8.925717389
>>      prot10 -10.65297508 -10.74532307 -10.65940071 -10.36671791
>>-10.50431649 -10.54915637 -11.07154003 -10.79884265 -10.97164196
>>-11.1201714 -11.14821342 -10.9254445 -10.92875918 -10.90806369
>>-10.77581175 -11.2324716 -11.31360896 -11.01070959 -11.04450945
>>-10.89694291 -10.76865867 -10.92983387 -11.07365287 -11.43888216
>>-11.14948441 -10.69611194 -10.85827316 -10.64470128 -10.79046792
>>-10.86048168
>>
>> ## Code that I am attempting to utilize:
>>      proteins.mat <- as.matrix(proteins.df)
>>      treat = 
>>c("g1.s","g1.c","g2.s","g2.c","g3.s","g3.c","g4.s","g4.c","g5.s","g5.c")
>>      factors = gl(10,3,labels=treat)
>>      design <- model.matrix(~0+factors)
>>      colnames(design) <- treat
>>
>> ## Here is the design for my model:
>>      > design
>>         g1.s g1.c g2.s g2.c g3.s g3.c g4.s g4.c g5.s g5.c
>>      1     1    0    0    0    0    0    0    0    0    0
>>      2     1    0    0    0    0    0    0    0    0    0
>>      3     1    0    0    0    0    0    0    0    0    0
>>      4     0    1    0    0    0    0    0    0    0    0
>>      5     0    1    0    0    0    0    0    0    0    0
>>      6     0    1    0    0    0    0    0    0    0    0
>>      7     0    0    1    0    0    0    0    0    0    0
>>      8     0    0    1    0    0    0    0    0    0    0
>>      9     0    0    1    0    0    0    0    0    0    0
>>      10    0    0    0    1    0    0    0    0    0    0
>>      11    0    0    0    1    0    0    0    0    0    0
>>      12    0    0    0    1    0    0    0    0    0    0
>>      13    0    0    0    0    1    0    0    0    0    0
>>      14    0    0    0    0    1    0    0    0    0    0
>>      15    0    0    0    0    1    0    0    0    0    0
>>      16    0    0    0    0    0    1    0    0    0    0
>>      17    0    0    0    0    0    1    0    0    0    0
>>      18    0    0    0    0    0    1    0    0    0    0
>>      19    0    0    0    0    0    0    1    0    0    0
>>      20    0    0    0    0    0    0    1    0    0    0
>>      21    0    0    0    0    0    0    1    0    0    0
>>      22    0    0    0    0    0    0    0    1    0    0
>>      23    0    0    0    0    0    0    0    1    0    0
>>      24    0    0    0    0    0    0    0    1    0    0
>>      25    0    0    0    0    0    0    0    0    1    0
>>      26    0    0    0    0    0    0    0    0    1    0
>>      27    0    0    0    0    0    0    0    0    1    0
>>      28    0    0    0    0    0    0    0    0    0    1
>>      29    0    0    0    0    0    0    0    0    0    1
>>      30    0    0    0    0    0    0    0    0    0    1
>>      attr(,"assign")
>>      [1] 1 1 1 1 1 1 1 1 1 1
>>      attr(,"contrasts")
>>      attr(,"contrasts")$factors
>>      [1] "contr.treatment"
>>
>> ## My contrast model. I want to test for interaction, differences
>>between genotypes, and to see if specific genotypes respond differently
>>to the treatment from one another:
>>      cmtx <- makeContrasts(
>>        
>>GenotypevsTreatment=(g1.s-g1.c)-(g2.s-g2.c)-(g3.s-g3.c)-(g4.s-g4.c)-(g5.s
>>-g5.c),
>>        
>>genotype=(g1.s+g1.c)-(g2.s+g2.c)-(g3.s+g3.c)-(g4.s+g4.c)-(g5.s+g5.c),
>>        Treatment=(g1.s+g2.s+g3.s+g4.s+g5.s)-(g1.c+g2.c+g3.c+g4.c+g5.c),
>>        levels=design)
>>
>> ## What my contrast model looks like, but I don't think this is correct:
>>      > cmtx
>>            Contrasts
>>      Levels GenotypevsTreatment Genotype Treatment
>>        g1.s                   1        1         1
>>        g1.c                  -1        1        -1
>>        g2.s                  -1       -1         1
>>        g2.c                   1       -1        -1
>>        g3.s                  -1       -1         1
>>        g3.c                   1       -1        -1
>>        g4.s                  -1       -1         1
>>        g4.c                   1       -1        -1
>>        g5.s                  -1       -1         1
>>        g5.c                   1       -1        -1
>>
>> ## Fitting the linear model by empirical bayes statistics for
>>differential expression:
>>      fit <- eBayes(contrasts.fit(lmFit(proteins.mat, design), cmtx))
>>      topTable(fit, adjust.method="BH")
>>
>> ## The below topTable proteins are the same as the subset of data from
>>above:
>>      > topTable(fit, adjust.method="BH")
>>             GenotypevsTreatment Genotype    Treatment    AveExpr
>>F      P.Value    adj.P.Val
>>      prot1        -0.40786338 60.30918  0.073054723  -9.918822 17308.55
>>1.124646e-39 1.232079e-36
>>      prot2        -0.09255219 59.60864  0.061701713  -9.897968 15801.43
>>3.304533e-39 1.232079e-36
>>      prot3        -0.23880357 73.48557  0.536672827 -12.090016 15650.65
>>3.701463e-39 1.232079e-36
>>      prot4        -0.11834000 66.76931  0.305471823 -11.122034 15522.46
>>4.079731e-39 1.232079e-36
>>      prot5        -0.15210172 59.21509 -0.183849274  -9.876144 14734.51
>>7.556112e-39 1.423908e-36
>>      prot6        -0.15761118 62.87467  0.155340561 -10.389362 14565.87
>>8.658504e-39 1.423908e-36
>>      prot7        -0.03886438 61.15652 -0.166795475 -10.182834 14551.88
>>8.757515e-39 1.423908e-36
>>      prot8        -0.10425341 64.63523 -0.186904167 -10.780359 14461.18
>>9.429854e-39 1.423908e-36
>>      prot9        -0.03426380 53.48057  0.007403722  -8.854471 13713.49
>>1.767090e-38 2.021378e-36
>>      prot10       -0.75250251 66.62646  0.327497120 -10.894506 13480.51
>>2.164184e-38 2.021378e-36
>>
>> I think I am naively constructing the contrast matrix. The result for
>>Genotype (above) looks incorrect to me. Ideally, I would like to figure
>>this out because I wish to apply similar modeling techniques to an
>>RNAseq dataset from the same experimental samples that I used for my
>>protein work.
>>
>> Also, in order to make the correct comparisons, do I also need to
>>employ the ŒduplicateCorrelation¹ function (pg. 50, lima guide 9.7
>>Multi-level Experiments)? Something like this:
>>
>> corfit <- duplicateCorrelation(proteins.mat, design)
>>> corfit$consensus
>> [1] -0.7888584
>> fit <- eBayes(contrasts.fit(lmFit(proteins.mat, design,
>>corrleation=corfit$consensus), cmtx))
>>
>> Any insight/corrections would be greatly appreciated, and if I have
>>forgotten some information that would aid in helping me with this
>>problem I am happy to cooperate.
>>
>> Cheers,
>> Ryan
>>
>> Ryan M. Ghan
>> Graduate Research Assistant
>> Department of Biochemistry and Molecular Biology, MS 330
>> University of Nevada, Reno
>> Reno, NV 89557
>> Howard 205
>> Lab: (775) 784-4225
>> rghan at unr.edu
>>
>>
>>
>> 	[[alternative HTML version deleted]]
>>
>>
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>>http://news.gmane.org/gmane.science.biology.informatics.conductor