[BioC] Filtering by variance, IQR, etc.
Mark W Kimpel
mwkimpel at gmail.com
Tue Apr 3 06:36:29 CEST 2007
I have been using what I consider to be non-biased filtering of
low-variance genes using the method described in "Bioinformatics and
Computational Biology Solutions using R and Bioconductor", R. Gentleman,
et al., page 233 for some time and have recently run into some
resistance from a colleague who claims that this type of filtering
distorts FDR calculations because it introduces bias. His reasoning is
that, since this method tends to filter out genes with higher p values
and/or lower fold changes, that it is sort of a sneaky way of
accomplishing just that. Of course, filtering by phenotype does
introduce bias, but in this case I believe that by filtering based on
the a priori assumption that we just aren't that interested in low
variance genes for biologic reasons (even if statistically significant
they will have very low fold changes and thus be of questionable
meaning) that we aren't violating the statistical underpinnings of the
analysis.
I need some help in justifying this filtering step. Does anyone know of
a peer-reviewed reference that gives a theoretical justification for its
use of of any empiric experiments that show that it is legit?
Thanks,
Mark
--
Mark W. Kimpel MD
Neuroinformatics
Department of Psychiatry
Indiana University School of Medicine
More information about the Bioconductor
mailing list