[BioC] How to pool subgroups for makeContrasts() and subsequent limma analysis?
René
rene.boettcher86 at gmail.com
Thu Feb 7 11:20:23 CET 2013
Dear James,
> Hi Rene,
>
> On 2/6/2013 11:29 AM, René wrote:
>> Dear James,
>>
>> I performed the pooled analysis as you suggested and compared the
>> results to a
>> pure B - A comparison (no subgroups specified). Interestingly, both
>> analyses
>> give different results (497 vs 15 genes with log2FC>= 1 and p< 0.05).
>> Could you explain this huge difference?
>
> If I assume that by a pure B-A comparison you redefined your design
> matrix so you only have three columns (A,B,C), and then did the B-A
> comparison, then it is simple to explain. I would also guess that the
> C-A comparison gives different results as well, depending on how you
> define your design matrix.
>
> Note that the contrast calculates the difference between the means of
> the two groups in the numerator and a measure of intra-group
> variability in the denominator. So in heuristic terms, the numerator
> says how different the groups are, and the denominator tells you if
> that difference is 'large' or not, by comparing to the within group
> variability. So if the groups are really 'tight' then a small
> difference in means might result in a significant test, but if the
> groups are really variable then the mean differences have to be pretty
> big as well to achieve significance.
>
> How you define your groups has no bearing on the numerator, because
> the difference of B-A is the same if you do B-A or if you do
> (B1+B2+B3)/3-A. However, the denominator may well be quite different,
> depending on the B1, B2, and B3 groups.
>
> In the instance where you did (B1+B2+B3)/3-A, the intra-group
> variability for the denominator is based in the variability within the
> A, B1, B2, B3, and C groups. So if all the B-type groups are pretty
> tight, then you will likely get more differentially expressed genes.
>
> If you do the 'pure' B-A comparison, then the denominator is based on
> the intra-group variability of the A,B,C groups. If the B1, B2, B3
> groups are pretty tight, but not really similar, then the combined B
> group will be highly variable, so your denominator will tend to be
> larger, resulting in fewer differentially expressed genes. Since the
> denominator is the same for all contrasts, I would imagine the C-A
> comparison has fewer genes as well.
>
> Does that help?
>
> Best,
>
> Jim
>
>>
>> Best regards,
>> René
Thank you for your very detailed explanation. Unfortunately, I observe
the opposite result, so more genes are found when testing B - A than
(B1+B2+B3)/3 - A.
If I understand your explanation correctly, it means that I select more
stringently in case of (B1+B2+B3)/3 - A due to a higher variation
between the subgroups. Therefore, lowering the cutoff values would again
correct my list of genes.
Unfortunately I am doing a meta analysis of two independent data sets
and I want to apply the same cutoff values for both data sets. This in
turn would increase my second result list (same group comparison, i.e. B
- A) from ~ 200 to a couple of thousand genes and thereby also introduce
additional noise. Hence my question is: is there a possibility to
somehow combine the results of both comparisons? Or is there a way to
correct for the increased variance between the subgroups?
Best regards,
René
More information about the Bioconductor
mailing list