[BioC] Antw: Re: limma voom: How to create double contrasts?

James W. MacDonald jmacdon at uw.edu
Wed Jul 18 16:50:21 CEST 2012


Hi Christian,

On 7/18/2012 9:55 AM, Christian Ametz wrote:
> Hi James
> thanks again for this excellent answer which helps me a lot. I've set 
> up the analysis pipeline for edgeR as well but since its contrast 
> model is even a bit more complicated to me, I've stuck to voom where I 
> can use the same contrasts as in the microarray experiments. I 
> compared edgeR, DESeq and voom results for pariwise comparisons and 
> they all share the "most significant" DE transcripts in common (ie 
> 15000 out of 18000) with just some transcripts unique for one of the 
> methods.
>
> Just one more question: "So you are literally looking for any gene 
> where C1F-C1M-C2F+C2M is significantly different from zero, regardless 
> of how that came about." Zero is in this case the overall mean 
> calculated by the samples not used in the contrast?

No. The way you have constructed your design matrix means that all of 
the coefficients represent the mean expression for each sample type 
(e.g., C1F is the sample mean for all of the C1 Fusariam treated 
samples). When you set up the contrasts, you are literally specifying 
the algebra to be used in constructing the numerator of your contrast 
t-statistic.

So (C1F-C1M)-(C2F-C2M) means that you want to take the difference 
between the mean of the Fusarium and mock treated samples for both the 
C1 and C2 genotypes, and then compute the difference between those 
differences. So I am not sure where you got the idea about 'samples not 
in the contrast'. The only thing being tested are the samples that are 
IN the contrast.

Once you have calculated the numerator for a given gene, you then need 
to decide if this value is different from zero, which would indicate 
that the two genotypes react differently to Fusarium treatment. This is 
a fundamental aspect to most statistics - the goal is to test two 
hypotheses and see if there is enough evidence to reject one of them. In 
the case of most t-statistics (e.g., these contrasts) the null 
hypothesis is that the numerator is equal to zero. The alternative 
hypothesis is (usually) that the numerator is not equal to zero. We are 
testing to see if we can reject the null hypothesis that the numerator 
is zero, which is the same thing as saying that there appears to be a 
difference between C1F-C1M and C2F-C2M. Or alternatively that the two 
genotypes react differently to Fusarium treatment.

After computing the numerator, we then have to decide if it is 
significantly different from zero. To do this, we need to estimate how 
accurate we think this numerator is, in order to decide if it is far 
enough away from zero for us to confidently say it is not zero. Note 
here that we are always estimating these measurements, so there is 
always error in the estimate. In addition, there is always underlying 
biological variability. There is always the possibility that the errors 
and variability will accumulate in such a way that the numerator is 
different from zero, but only because of inaccuracies in our 
measurements and inherent variability, rather than underlying biological 
differences.

So to decide if the numerator is different from zero, we divide by a 
measure of intra-sample variability. We use the intra-sample variability 
as a yardstick (meter stick for you, I suppose ;-D) to help decide if 
the numerator is different from zero. Any genes with low intra-sample 
variability will not require much difference in the numerator for us to 
say it is different from zero, conversely if the intra-sample 
variability is large, then the numerator will have to be correspondingly 
large in order for us to say it is different from zero.

When you compute the contrast, the resulting statistic is assumed to 
follow the t-distribution. We already know the range of values to expect 
for a t-statistic if the numerator really is zero, so we compare your 
observed t-statistic to the expected values. If the observed t-statistic 
is bigger or smaller than we expect, then we assume that is because the 
numerator really isn't equal to zero, and we reject the null hypothesis. 
That is the statistician's viewpoint.  An alternative is to say that we 
have a significant t-statistic, which indicates that there really is a 
difference between how the two genotypes react to Fusarium treatment.


Best,

Jim


> Thank you!
> All the best
> Christian
>
> >>> "James W. MacDonald" <jmacdon at uw.edu> 7/18/2012 3:43 >>>
> Hi Christian,
>
> On 7/18/2012 9:11 AM, Christian Ametz wrote:
> > Hi James,
> > thanks for your help. The problem with this interaction is, that I
> > find nearly none DE transcripts in our RNA-Seq experiment (3
> > replicates) while based on our microarray results we expect something
> > in the range of a few hundreds...
>
> I am currently doing some comparisons between RNA-seq and microarray
> results as well (using edgeR rather than voom()), and see surprisingly
> little overlap between the two. I can come up with many reasons why this
> might be so, but without knowing the true underlying status of the
> transcript abundance it is not possible to know which platform does a
> better job of estimating what is really happening. Just knowing that
> they are different isn't particularly enlightening.
>
>
> > Am I correct that this interaction finds significance if a) the
> > expression of the two groups is sig. different, or b) the fold-change
> > ratio is different (ie. C1 goes from 100 to 1000 when treating with
> > Fusarium compared to mock while C2 goes just from 50 to 100)  or) C1
> > is upregulated while C2 is downregulated when treated with Fusarium
> > and vice versa.
>
> There are two issues here. First, the numerator of the interaction
> contrast is exactly as you have specified, so it is the difference
> between Fusarium vs mock for the two genotypes. This can be significant
> for a variety of reasons (e.g., C1 goes up in F vs M, C2 goes down in
> the same comp, or C1 doesn't change, but C2 either goes up or down, or
> C1 goes up or down and C2 doesn't change).
>
> So you are literally looking for any gene where C1F-C1M-C2F+C2M is
> significantly different from zero, regardless of how that came about. As
> for fold change, you could hypothetically filter on the fold change of
> one or the other 'subcontrasts', but in general I think people just
> filter on the sum of the contrast. In which case you cannot say anything
> about the underlying contrasts (e.g., C1F-C1M might have a 0.5 log FC,
> and C2F-C2M might have a -0.5 log FC, which sums to 1 in the
> interaction, but neither contrast is significant at a two-fold FC).
>
>
> > My problem is to interpret the contrasts correctly, besides the basic
> > contrasts. So basically whats the difference between this two contrasts:
> > Diff_C1C2=(C1F-C1M)-(C2F-C2M) and
> > Diff_C1C2_new = (C1F+C2F)/2-(C1M+C2M)/2 ?
>
> This is simple algebra, so don't over think things. We have gone over
> the interaction term, so let's think about the second contrast you are
> looking at. The first term is the average expression for the C1F and C2F
> samples, and the second term is the average of the C1M and C2M samples.
> And you are computing the difference between the two, so it is simply
> the difference between F and M, averaged over the C1 and C2 genotypes.
>
> This might be something of interest, if you either think the C1 and C2
> genotypes aren't really that different, or if you are looking for genes
> that react similarly in the two genotypes.
>
> Best,
>
> Jim
>
>
> > Will the second contrast just find significance if both groups are
> > either up or downregulated?
> > Thanks once more for your help!
> > All the best
> > Christian
> >
> > >>> "James W. MacDonald" <jmacdon at uw.edu> 7/18/2012 2:55 >>>
> > Hi Christian,
> >
> > On 7/17/2012 4:32 AM, Christian Ametz wrote:
> > > Dear members,
> > >
> > > I'm struggling to create the contrast matrix for our experiment
> > comprising of 4 genotypes in treated and untreated conditions:
> > >
> > > The genotypes are labelled C1 to C4
> > > The conditions are treated (F) and untreated (M)
> > > I set up the contrast matrix like this:
> > >
> > > cont.matrix.30<- makeContrasts(
> > > C1_FvsM=C1F-C1M,
> > > C2_FvsM=C2F-C2M,
> > > C3_FvsM=C3F-C3M,
> > > C4_FvsM=C4F-C4M,
> > > Diff_C1C2=(C1F-C1M)-(C2F-C2M),
> > > levels=design)
> > >
> > >
> > > The first four "normal" contrasts estimating the response of
> > treated/untreated genotypes are working great. My problem is the
> > "double contrast" Diff_C1C2 where I want to find those genes that show
> > a difference in response (treated vs untreated)  between the two
> > genotypes C1 and C2.
> >
> > The technical term for that contrast is an interaction, and you have set
> > it up correctly. So what is the problem?
> >
> > Best,
> >
> > Jim
> >
> >
> > >
> > >
> > > I would gladly accept any help!
> > >
> > > Thanks
> > > Christian
> > >
> > >
> > >
> > >
> > >
> > >
> > > [[alternative HTML version deleted]]
> > >
> > > _______________________________________________
> > > Bioconductor mailing list
> > > Bioconductor at r-project.org
> > > https://stat.ethz.ch/mailman/listinfo/bioconductor
> > > Search the archives:
> > http://news.gmane.org/gmane.science.biology.informatics.conductor
> >
> > --
> > James W. MacDonald, M.S.
> > Biostatistician
> > University of Washington
> > Environmental and Occupational Health Sciences
> > 4225 Roosevelt Way NE, # 100
> > Seattle WA 98105-6099
> >
>
> -- 
> James W. MacDonald, M.S.
> Biostatistician
> University of Washington
> Environmental and Occupational Health Sciences
> 4225 Roosevelt Way NE, # 100
> Seattle WA 98105-6099
>

-- 
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099



More information about the Bioconductor mailing list