[BioC] Filtering probes without annotation prior to statistical test

Mark Cowley m.cowley at garvan.org.au
Tue Jul 29 07:26:32 CEST 2008


Hi Seungwoo,
that type of filtering is definitely valid, and I have seen very  
similar proportions of probesets with no annotation, however the  
number of probesets in group (3) changes with each new transcript.csv  
file (the latest being labelled na26), implying that some of the  
probesets may have had annotation in a previous version, and some that  
did have annotations no longer do.

The only caveat with removing (3) is that there may be differentially  
expressed "genes/somethings" with a little effort in the form of  
aligning probe sequences could reveal some interesting novel biology.

cheers,
Mark

On 29/07/2008, at 12:37 PM, Seungwoo Hwang wrote:

> Dear all,
>
> I am analyzing data from Affymetrix Human Gene 1.0 ST Array.
>
> After inspecting its probe annotation file, it came to my attention  
> that it contains a lot of probesets without transcript annotation as  
> follows;
>
> Total number of probesets: 33,298
> (1) Probesets with annotation: 24,409 (73%)
> (2) Control probesets: 4,201 (13%)
> (3) Probesets without any annotation: 4,688 (14%)
>
> I am thinking about filtering out the probesets (2) and (3) prior to  
> statistical tests in order to reduce the total number of probesets  
> that are subject to statistical tests. Doing so will make a lot of  
> differences in multiple testing correction, compared to doing  
> statistical tests on all probesets (1),(2), and (3) followed by  
> filtering out the probesets (2) and (3) from the DEG list.
>
> Is this type of filtering prior to statistical tests valid? Also,  
> has anyone encountered a similar situation (dealing with array data  
> with a lot of non-gene probes).
>
> Thanks,
>
> Seungwoo
>
> ------------------------------------
> Seungwoo Hwang, Ph.D.
> Senior Research Scientist
> Korean Bioinformation Center  (http://www.kobic.re.kr)
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor



More information about the Bioconductor mailing list