[BioC] Filtering by variance, IQR, etc.

Tue Apr 3 06:36:29 CEST 2007

I have been using what I consider to be non-biased filtering of 
low-variance genes using the method described in "Bioinformatics and 
Computational Biology Solutions using R and Bioconductor", R. Gentleman, 
et al., page 233 for some time and have recently run into some 
resistance from a colleague who claims that this type of filtering 
distorts FDR calculations because it introduces bias. His reasoning is 
that, since this method tends to filter out genes with higher p values 
and/or lower fold changes, that it is sort of a sneaky way of 
accomplishing just that. Of course, filtering by phenotype does 
introduce bias, but in this case I believe that by filtering based on 
the a priori assumption that we just aren't that interested in low 
variance genes for biologic reasons (even if statistically significant 
they will have very low fold changes and thus be of questionable 
meaning) that we aren't violating the statistical underpinnings of the 
analysis.

I need some help in justifying this filtering step. Does anyone know of 
a peer-reviewed reference that gives a theoretical justification for its 
use of of any empiric experiments that show that it is legit?

Thanks,
Mark
-- 
Mark W. Kimpel MD
Neuroinformatics
Department of Psychiatry
Indiana University School of Medicine