[BioC] How to pool subgroups for makeContrasts() and subsequent limma analysis?

Thu Feb 7 11:20:23 CET 2013

Dear James,

> Hi Rene,
>
> On 2/6/2013 11:29 AM, René wrote:
>> Dear James,
>>
>> I performed the pooled analysis as you suggested and compared the 
>> results to a
>> pure B - A comparison (no subgroups specified). Interestingly, both 
>> analyses
>> give different results (497 vs 15 genes with log2FC>= 1 and p<  0.05).
>> Could you explain this huge difference?
>
> If I assume that by a pure B-A comparison you redefined your design 
> matrix so you only have three columns (A,B,C), and then did the B-A 
> comparison, then it is simple to explain. I would also guess that the 
> C-A comparison gives different results as well, depending on how you 
> define your design matrix.
>
> Note that the contrast calculates the difference between the means of 
> the two groups in the numerator and a measure of intra-group 
> variability in the denominator. So in heuristic terms, the numerator 
> says how different the groups are, and the denominator tells you if 
> that difference is 'large' or not, by comparing to the within group 
> variability. So if the groups are really 'tight' then a small 
> difference in means might result in a significant test, but if the 
> groups are really variable then the mean differences have to be pretty 
> big as well to achieve significance.
>
> How you define your groups has no bearing on the numerator, because 
> the difference of B-A is the same if you do B-A or if you do 
> (B1+B2+B3)/3-A. However, the denominator may well be quite different, 
> depending on the B1, B2, and B3 groups.
>
> In the instance where you did (B1+B2+B3)/3-A, the intra-group 
> variability for the denominator is based in the variability within the 
> A, B1, B2, B3, and C groups. So if all the B-type groups are pretty 
> tight, then you will likely get more differentially expressed genes.
>
> If you do the 'pure' B-A comparison, then the denominator is based on 
> the intra-group variability of the A,B,C groups. If the B1, B2, B3 
> groups are pretty tight, but not really similar, then the combined B 
> group will be highly variable, so your denominator will tend to be 
> larger, resulting in fewer differentially expressed genes. Since the 
> denominator is the same for all contrasts, I would imagine the C-A 
> comparison has fewer genes as well.
>
> Does that help?
>
> Best,
>
> Jim
>
>>
>> Best regards,
>> René

Thank you for your very detailed explanation. Unfortunately, I observe 
the opposite result, so more genes are found when testing B - A than 
(B1+B2+B3)/3 - A.
If I understand your explanation correctly, it means that I select more 
stringently in case of (B1+B2+B3)/3 - A due to a higher variation 
between the subgroups. Therefore, lowering the cutoff values would again 
correct my list of genes.
Unfortunately I am doing a meta analysis of two independent data sets 
and I want to apply the same cutoff values for both data sets. This in 
turn would increase my second result list (same group comparison, i.e. B 
- A) from ~ 200 to a couple of thousand genes and thereby also introduce 
additional noise. Hence my question is: is there a possibility to 
somehow combine the results of both comparisons? Or is there a way to 
correct for the increased variance between the subgroups?

Best regards,
René