[BioC] Analyzing mulitple tissues

Wed Jun 8 14:38:44 CEST 2005

I agree completely with Gordon's comments.

The less replication you have, the more you need to rely on the statistical 
model.  The basic model says that the variance does not change with the 
experimental condition.  If you take this seriously, you could replicate in 
the "least expensive" condition.  But it is a very strong assumption.  An 
even stronger assumption is that ALL the genes have the same variance for 
the same condition - this is equivalent to selecting a "fold change" 
criterion.

e.g. Do you really believe that the variability of the differentially 
expressing genes is the same in "normal" and "cancerous" tissue, or in 
"developing" and "mature" organs?

What I tell people is that if you don't plan to replicate all your 
conditions, you should plan to have a statistician as a collaborator.  Then 
at least you have someone who knows how to check the sensitivity of the 
analysis to the assumptions made, and possibly check the validity of the 
assumptions.  As well, statistical optimal design can help in designing 
experiments with higher statistical power for the same cost (although 
sometimes the obvious design has the highest power), or cut your cost.  I 
just reduced the array budget of 1 experiment by 25% by a very modest 
change in the design.  Besides the reduction in the number of arrays, the 
number of animals was similarly reduced.

--Naomi

At 10:46 PM 6/7/2005, Gordon Smyth wrote:

>>David Kipling KiplingD at cardiff.ac.uk
>>Tue Jun 7 15:09:17 CEST 2005
>>
>>Dear Naomi, Gordon and Uri,
>>
>>If I might try to bring together Naomi's comments with those of Gordon and
>>see if I have followed this correctly:
>>
>>Uri's original design is:
>>
>>Cardiac1, Cardiac2, Cardiac3
>>Skeletal1, Skeletal2
>>MSC
>>
>>That is, a 3x2x1 6-chip experiment.
>>
>>Naomi commented that with no replication (i.e. the single MSC chip) one
>>cannot judge biological variation and the best thing to do is a simple
>>fold-change:  "...there is no statistically valid means of analyzing your
>>data that improves on an arbitrary choice of 'fold difference', such as
>>2-fold difference." {Naomi}
>
>There isn't any conflict between Naomi's comments and my own. Naomi 
>actually refered to "biological replication" rather than to replication 
>per se. She was reacting to Uri's original post which made it very unclear 
>whether there is any biological replication in his experiment at all, 
>i.e., it may be that Cardiac1, Cardiac2 etc are not in fact biological 
>replicates. Replication is a subtle business, and Uri would need to 
>describe his process and population in much more detail than he done for 
>more to be said. I may be wrong, but I doubt that Naomi was especially 
>concerned about the single MSC chip.
>
>On the other hand, my comments were addressed at your mock experiment and 
>were made on the basis that all replication for states 1 and 2 is true 
>biological replication.
>
>>Then Gordon replied:
>>
>> >> Out of curiosity, what is limma doing here and how should one interpret
>> >> these t stats/p-values (if indeed one should!)?  Are they any use over
>> >> simple M values?
>> >
>> > Yes, they are almost always better than simply using fold changes.  Using
>> > M-values alone would
>> > make no use of replication while the t-statistics make use of whatever
>> > replication is available.
>> > Put very simply, some replication is better than none.
>> >
>> > You seem to be concerned in your mock experiment that one of the 
>> states has no
>> > replication.  The
>> > limma analysis estimates the variance for each gene from the replicates
>> > available for states 1 and
>> > 2 and applies that estimate to state 3 as well.  This analysis is 
>> perfectly
>> > valid provided that
>> > the variability of the expression values is similar in state 3 to that in
>> > states 1 and 2.
>> >
>> > Even when the variability is different in state 3, the limma analysis 
>> still
>> > gives a better ranking
>> > than fold change, even for comparisons involving state 3, in most 
>> cases.  The
>> > basic assumption is
>> > that, across genes, the variance in state 3 is positively associated 
>> with the
>> > variance in states 1
>> > and 2.  This is a very weak assumption which is almost always true in
>> > practice, as genewise
>> > differences in variability tend to dominate state-wise differences.
>
>All my comments below are made on the basis that all replication for 
>states 1 and 2 is biological replication.
>
>>If I follow Gordon correctly, his argument is that in an experimental design
>>like this you can make an estimate for the variance of a probeset based on
>>its behaviour in the other samples (with some opportunity for discussion as
>>to how valid an assumption this is!).   This results in a situation where
>>not all fold changes are equal, and this will actually work better than a
>>simple FC estimate for ranking the genes for further exploration.
>
>Yes.
>
>>In other words, in the 3x2x1 design such as this you could get two probesets
>>that had identical M values (calculated between the triplicate and single
>>chips) *but* limma would rank the probeset with the higher overall
>>variability across the six chips lower down the list (seen as a different
>>p-value/t statistic).
>
>Exactly.
>
>>So Uri could use limma to study this 3x2x1 design and be able to extract
>>potentially differentially regulated genes between the (single) MSC sample
>>and either/both of the other two sample classes using the limma p-values
>>returned, and this would be a more powerful approach than simple
>>fold-changes - yes?
>
>His design is actually 3+2+1 rather than 3x2x1. If his Cardiac and 
>Skeletal samples are biological replicates, then yes. If not, see Naomi's 
>comments.
>
>>This is a very interesting point for those of us in core facilities having
>>to help users who insist - for reasons of finances, scarce samples, or the
>>fact they the experiments are of a preliminary grant-generating nature - on
>>doing small-scale experiments where some samples have no replication at all.
>>Me telling them to go away and come back with 15-fold replication isn't
>>particularly helpful(!), and instead suggestions as to how to wring the
>>maximum information from such narrow datasets are what they need.
>
>Making the best use of small-scale experiments is the primary purpose of 
>the limma software.
>
>In general, you can still do an analysis with only 1 chip for one of the 
>groups, unless you have a strong reason to think that the variability of 
>expression will be quite different in that group to the others. Generally 
>speaking, the process will work best when the different groups (e.g., 
>tissue types) are as similar as possible.
>
>Gordon
>
>>Thanks everyone,
>>
>>David
>>
>>
>>
>>
>>Professor David Kipling
>>Department of Pathology
>>School of Medicine
>>Cardiff University
>>Heath Park
>>Cardiff CF14 4XN
>>
>>Tel:    +44 29 2074 4847
>>Fax:    +44 29 2074 4276
>>Email:  KiplingD at cardiff.ac.uk
>
>_______________________________________________
>Bioconductor mailing list
>Bioconductor at stat.math.ethz.ch
>https://stat.ethz.ch/mailman/listinfo/bioconductor

Naomi S. Altman                                814-865-3791 (voice)
Associate Professor
Bioinformatics Consulting Center
Dept. of Statistics                              814-863-7114 (fax)
Penn State University                         814-865-1348 (Statistics)
University Park, PA 16802-2111