[BioC] redundant probe sets in Affymetrix HG-U219
James W. MacDonald
jmacdon at med.umich.edu
Thu Apr 14 23:17:56 CEST 2011
Hi Andreas,
On 4/14/2011 5:27 AM, Andreas Heider wrote:
> Dear Bioconductor mailing list,
> is ther a sensible way to deal with redundant probesets on Affymetrix chips
> like the HG-U219?
Define sensible.
There are some things you can do, but each comes with its own assumptions.
There is the findLargest() function in genefilter that will select the
probeset with the largest value of a test statistic. This assumes (among
other things) that all of the redundant probesets measure the same
thing. But note that the _x_ and _s_ in the probesets you list below
indicate that when Affy designed that chip the probesets
cross-hybridized with unrelated or related transcripts, respectively.
You can use the MBNI re-mapped cdfs, which take current versions of the
genome and filter out probes that don't uniquely hybridize to the
genome, and then map probes to probesets based on e.g., Entrez Gene IDs.
This eliminates the problem of multiple probesets, but you then have to
contend with probesets that vary from ~3 probes up to 100 or more. As
you can imagine, the probesets with 3 probes will have much larger
standard errors than those with say 100 probes. This makes downstream
analyses more difficult unless you choose to simply ignore that fact.
You could ignore the fact that you have multiple probesets that may or
may not be measuring the same thing, and assume independence (which, of
course isn't even true when you have no redundant probesets).
No real satisfying alternatives, IMO, so you have to pick your poison.
Best,
Jim
> For Example:
> Probe Set ID RefSeq Transcript ID 11715100_at NM_003534 11715101_s_at
> NM_003534 11715102_x_at NM_003534
> Should I get the median/mean of te expression intensities? Or select the
> highest? And what would be the procedre in R to do it? I mean, how do I tell
> R to return the median of expression values if there are more than 1
> probesets for only 1 refseq ID?
>
> I hope you can help me, Andreas
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
--
James W. MacDonald, M.S.
Biostatistician
Douglas Lab
University of Michigan
Department of Human Genetics
5912 Buhl
1241 E. Catherine St.
Ann Arbor MI 48109-5618
734-615-7826
**********************************************************
Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues
More information about the Bioconductor
mailing list