[BioC] [limma] [Rfit] [samr] Gene expression distribution using lmFit and eBayes

Jérôme Lane jerome.lane at criucpq.ulaval.ca
Sat Nov 23 23:07:43 CET 2013


Dear Gordon,

Thanks for the reply, I appreciated it very much.

Is there a way to determine the « quality » of normalization for genes ?

Best regards,
Jerome

Le 2013-11-22 20:55, « Gordon K Smyth » <smyth at wehi.EDU.AU> a écrit :

>Dear Jerome,
>
>The Shapiro test is only applicable to iid samples, so it is difficult to
>see how it could be used to test normality of expression values in a
>linear modelling context.  If you have applied the test to the normalized
>expression values for each gene, then I suspect that the test is actually
>picking up differential expression rather than non-normality.
>
>The limma code is very robust against non-normality.  All the usual
>microarray platforms and standard preprocessing procedures produce data
>that is normally distributed to a good enough approximation.  Much effort
>has been devoted to developing good preprocessing and normalization
>algorithms.
>
>The concept of "robustness" in statistical analysis goes back a 1953
>paper 
>by George Box in Biometrika.  In that paper, Box wrote of the "remarkable
>property of robustness to non-normality which [tests for comparing means]
>possess".  The tests done by limma inherit the robustness property that
>Box was referring to.  Box made the point that the robustness of the two
>sample t-test was not improved by checking first for equal variances.  He
>said
>
>"To make the preliminary test on variances is rather like putting to sea
>in a rowing boat to find out whether conditions are sufficiently calm for
>an ocean liner to leave port!"
>
>I rather think that, if Box was still alive today, he might say something
>similar about a preliminary Shapiro test!
>
>Best wishes
>Gordon
>
>> Date: Thu, 21 Nov 2013 17:42:21 -0500
>> From: Jerome Lane <jerome.lane at criucpq.ulaval.ca>
>> To: "bioconductor at stat.math.ethz.ch" <bioconductor at stat.math.ethz.ch>
>> Subject: [BioC] [limma] [Rfit] [samr] Gene expression distribution
>> 	using lmFit and eBayes
>>
>>   Hi,
>>
>>   The 3/4 of my microarray gene expressions have non normal
>>distribution with
>>   most of p-values after Shapiro test under 10x-5.
>>
>>   I tried linear ranked regression from rfit (no normality assumption
>>for
>>   residues)  from Rfit package for adjustment of covariables +  SAM (non
>>   parametric) from samr package but results where not as biologically
>>relevant
>>   as lmFit + eBayes could provide.
>>
>>   I know that lmFit function can analyses gene expression not strictly
>>normal,
>>   but what is the limit ?
>>
>>   Is it statistically relevant to use lmFit + eBayes according to my
>>data ?
>>
>>   Best regards,
>>
>>   Jerome Lane
>
>______________________________________________________________________
>The information in this email is confidential and inten...{{dropped:6}}



More information about the Bioconductor mailing list