[BioC] edgeR topTags output (logFC)

Gordon K Smyth smyth at wehi.EDU.AU
Wed Aug 28 07:52:33 CEST 2013


Hi Susan,

Thanks, I see now.

Your experiment has 152 genotype x treatment combinations, and all of them 
occur in your experiment.  I strongly suggest that, instead of using an 
interaction model formula, you simply define a factor that takes 152 
levels and then make comparisons between the levels.  This fits the same 
model as you have fitted, but is parametrized in a way that is more 
interpretable and useful.  See Section 3.3.1 of the edgeR User's Guide. 
This is almost always the best approach because you know exactly what each 
comparison means.

I am not a fan of interaction model formulas like geno*trt for genomic 
experiments.  I don't think that the coefficients have useful meanings for 
most experiments.  In particular, the linear effect terms (like trtshade) 
cannot be interpreted because of the present of the interactions.

On Tue, 27 Aug 2013, Susan Bush wrote:

> Okay, sorry.  Here's more information:
>
> The model design I am using is       *design <- model.matrix( ~ geno * trt)*,
> where "geno" has 76 values and "trt" has 2 (sun, shade)
>
> Using glmLRT, I am testing "trt", so      *  lrt <- glmLRT(dge, fit , coef
> = grep("trt", colnames(design))*
>
> The logFC values returned by topTags, then, are for columns
> *trtshade, geno2.trtshade, geno3.trtshade
> *, etc.

This will give you a test of whether any of the genotypes show a treatment 
effect.  This test is on 76 degrees of freedom, so there should be 76 
columns for this test.  I could explain what the individual coefficients 
mean mathematically, but I don't think that they are useful for you.

> I also test "geno", where the logFC values are for columns        *geno2,
> geno3, geno2.trtshade, geno3.trtshade*, etc.

This is a test of whether there any differences between any of the 
genotypes in either of the treatment conditions (on 150 degrees of 
freedom).

> Ultimately I test the interaction, so logFC values are for columns
>  *geno2.trtshade,
> geno3.trtshade, geno4.trtshade*, etc.

All the genotypes are versus geno1, and treatment is versus sun, so the 
coefficient "geno2.trtshade" measures how much larger the logFC for shade 
vs sun is for geno2 as compared to geno1.  In other words, the coefficient 
means:

   geno2.trtshade =
   (geno2.shade - geno2.sun) - (geno1.shade - geno1.sun)

Best wishes
Gordon

> I am trying to figure out what these logFC coefficients are compared to, in
> each case.  If I am interested in specific genotype effects, will I get a
> more appropriate answer by testing "geno2" individually, rather than "geno"
> all at once?
>
> Thanks - I do appreciate it.
>
>
>
>
>
>
>
> *****
> Susan Bush, Ph. D.
> Postdoctoral Scholar
> Maloof Lab, Dept of Plant Biology
> University of California, Davis
> 1 Shields Avenue
> Davis, CA 95616
>
> email: smbush at ucdavis.edu
> lab: 530-752-8086
> cell: 507-828-8958
>
>
> On Tue, Aug 27, 2013 at 5:05 PM, Gordon K Smyth <smyth at wehi.edu.au> wrote:
>
>> Dear Susan,
>>
>>  Date: Mon, 26 Aug 2013 18:28:03 -0700 (PDT)
>>> From: "Susan [guest]" <guest at bioconductor.org>
>>> To: bioconductor at r-project.org, smbush at ucdavis.edu
>>> Subject: [BioC] edgeR topTags output (logFC)
>>>
>>>
>>> I'm just wanting to confirm my understanding of the edgeR topTags()
>>> output.  I'm using a design matrix with many coefficients (76 genotypes, 2
>>> treatments[sun, shade]).
>>>
>>> 1) In the topTags output, I am looking at logFC which represents the
>>> estimated or predicted means -- NOT coefficients, correct?
>>>
>>
>> Incorrect.  As the abbreviation suggests, logFC gives you
>> log-fold-changes.  logFC are indeed coefficients, not group means.
>>
>>  2) I have selected one parental genotype to be my reference genotype, and
>>> I am comfortable interpreting the logFC values for the other genotypes as
>>> comparisons to this genotype.  I have selected one treatment as the
>>> reference (sun).  When I see the logFC values for 'trtshade', I assume this
>>> is also in comparison to the reference genotype in the reference treatment.
>>>
>>
>> Yes, assuming that trt is what you have called the treatment factor, then
>> trtshade would the coefficient estimating the log-fold-change between the
>> shade and sun treatments.
>>
>>  My question is about looking at 'geno2.trtshade' values - are these
>>> values in comparison to 'trtshade' or 'geno2' or the reference
>>> genotype/reference treatment values?
>>>
>>
>> Without knowing what model you have fitted, or which coefficients you have
>> tested for, it is impossible to tell.
>>
>> Best wishes
>> Gordon
>>
>>  For example:  this is calling glmLRT with coef = grep("trt",
>>> colnames(Design)) to identify treatment effects
>>>
>>>           Row.names       logFC.trtsha    logFC.genoIL1.1.trtsha   logCPM
>>>        LR        PValue           FDR
>>> 1 Solyc08g078300.2.1    0.9440264           -0.379030370 4.207443
>>> 1135.8061 1.450273e-188 1.637938e-184
>>> 2 Solyc06g060830.2.1    0.9630343            0.038988710 4.511893
>>> 1095.1525 2.539194e-180 1.433883e-176
>>> 3 Solyc06g067910.2.1    1.2468599           -0.096499016 4.795789
>>> 1040.5813 2.722511e-169 1.024935e-165
>>> 4 Solyc06g008580.2.1    1.1500683           -0.451731914 3.300982
>>>  696.1677 6.043868e-101  1.706486e-97
>>> 5 Solyc07g017600.2.1    0.5026292            0.007208458 5.209938
>>>  694.2070 1.451776e-100  3.279271e-97
>>> 6 Solyc10g083760.1.1   -0.2877568            0.045952894 7.033394
>>>  651.8629  2.232759e-92  4.202796e-89
>>>
>>> Thanks so much!
>>>
>>>
>>> -- output of sessionInfo():
>>>
>>>  sessionInfo()
>>>>
>>> R version 3.0.1 (2013-05-16)
>>> Platform: x86_64-pc-linux-gnu (64-bit)
>>>
>>> locale:
>>> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C         LC_TIME=C
>>>  LC_COLLATE=C         LC_MONETARY=C        LC_MESSAGES=C        LC_PAPER=C
>>> [8] LC_NAME=C            LC_ADDRESS=C         LC_TELEPHONE=C
>>> LC_MEASUREMENT=C     LC_IDENTIFICATION=C
>>>
>>> attached base packages:
>>> [1] splines   stats     graphics  grDevices utils     datasets  methods
>>> base
>>>
>>> other attached packages:
>>> [1] ggplot2_0.9.3.1 reshape_0.8.4   plyr_1.8        edgeR_3.2.3
>>> limma_3.16.4
>>>
>>> loaded via a namespace (and not attached):
>>> [1] MASS_7.3-26        RColorBrewer_1.0-5 colorspace_1.2-2
>>> dichromat_2.0-0    digest_0.6.3       grid_3.0.1         gtable_0.1.2
>>> labeling_0.1
>>> [9] munsell_0.4        proto_0.3-10       reshape2_1.2.2     scales_0.2.3
>>>       stringr_0.6.2      tools_3.0.1
>>>
>>
>>
>> ______________________________**______________________________**__________
>> The information in this email is confidential and intended solely for the
>> addressee.
>> You must not disclose, forward, print or use it without the permission of
>> the sender.
>> ______________________________**______________________________**__________
>>
>

______________________________________________________________________
The information in this email is confidential and intend...{{dropped:4}}



More information about the Bioconductor mailing list