[BioC] what's really in hgu133plus2.db?

Michal Okoniewski michal.okoniewski at fgcz.ethz.ch
Sat Feb 19 11:15:21 CET 2011


Hey Tim, 

I discovered the old chips in XMAP browser yesterday, when I was looking for genes with people using the old arrays... cool "legacy" feature :) 
Are the hits of HGU133plus2 etc in the xmapcore database by chance too?

Cheers, 
Michal
________________________________________
From: bioconductor-bounces at r-project.org [bioconductor-bounces at r-project.org] on behalf of Tim Yates [TYates at picr.man.ac.uk]
Sent: Saturday, February 19, 2011 12:03 AM
To: d.e.iles at leeds.ac.uk
Cc: mailman, bioconductor
Subject: Re: [BioC] what's really in hgu133plus2.db?

We map the hgu133plus2 array to ensembl as part of the xmapcore package.

The mappings can be seen on the xmap browser

http://xmap.picr.man.ac.uk/?a=HG-U133Plus2&ch=17&lay=gene&q=Tp53

Our you can install the human xmapcore database (from the downloads page of that site) into a local copy of mysql, install the xmapcore package from bioconductor, and map from probesets to exons, transcripts or genes.

Just another option,

Tim



----- Reply message -----
From: "James W. MacDonald" <jmacdon at med.umich.edu>
Date: Fri, Feb 18, 2011 20:42
Subject: [BioC] what's really in hgu133plus2.db?
To: "David Iles" <D.E.Iles at leeds.ac.uk>
Cc: "bioconductor at r-project.org" <bioconductor at r-project.org>

Hi David,

On 2/18/2011 2:58 PM, David Iles wrote:
> Jim,
>
> Thanks for your response. The point of understanding exactly where a
> probeset is located is of fundamental importance because it is now
> clear from the ENCODE project that around 90% of genome sequence is
> actively transcribed in a regulated way - John Mattick presented an
> excellent talk introducing this topic at the HGM2007 meeting in
> Montreal. The question then is; 'is it mRNA or another (regulatory?)
> RNA species that we are measuring?'. The fact that 'orphaned'
> probesets detect significantly up- or down-regulated transcription is
> extremely interesting and should not be ignored just because they now
> map outside 'genes' (whatever they may be - the human GNAS locus
> generates 59 different transcripts, some of which do not overlap).

Which is the gist of my original question to you. The annotation
packages we provide take the original manufacturer at their word and
simply map the intended target to other annotation sources. Therefore,
if you are interested in 'non-traditional' (for lack of a better term)
transcripts, then the updated status of the annotation databases isn't
relevant.

However, the packages that have been developed for next-gen sequencing
may be of interest. The Biostrings and BSGenome.Hsapiens.UCSC.hgXX
packages will allow you to very quickly align all the probesets to the
genome of your choice. Then depending on how you want to proceed, things
like rtracklayer, GenomicFeatures, GenomicRanges, etc can help discern
known transcripts from possible 'other' RNA species.

Best,

Jim


>
> Dave Dr David Iles Institute for Integrative and Comparative Biology
> University of Leeds Leeds LS2 9JT
>
> d.e.iles at leeds.ac.uk
>
>
>
>
> On 18 Feb 2011, at 19:24, James W. MacDonald wrote:
>
>> Hi David,
>>
>> On 2/18/2011 11:41 AM, David Iles wrote:
>>> Dear All,
>>>
>>> Can anyone point me to a URL where I can obtain an overview of
>>> the sources of the data incorporated in the current version of
>>> hgu133plus2.db? I saw to my horror that the actual probesets are
>>> based on a really obsolete human genome assembly (2003), which
>>> has changed significantly over the years. As have also genes,
>>> gene locations, genomic intervals, RefSeq/UniGene entries
>>> etcetcetc......
>>
>> So what exactly is the question? As you note, the chip was designed
>> in the early 2000's, so was necessarily based on a (now) old
>> version of the UniGene database. That is the downfall of the
>> expression arrays; they are stale almost from the instant they hit
>> the market.
>>
>> Since the probesets are based on things that may now be different,
>> it is to a certain extent irrelevant how current the hgu133plus2.db
>> data are, because the probeset -->  gene mappings may be suspect.
>> You can update the gene info all you want, but if the probeset
>> doesn't actually measure a given transcript, then what is the
>> point?
>>
>> We base the annotation on the probeset -->  entrez gene mappings
>> supplied by Affymetrix, which are supposed to be updated regularly.
>> Not having checked that (and given the fact that we take no stance
>> on the veracity of these mappings), they are what they are. Any
>> significant results will require close inspection of the probesets
>> to determine if you believe that they measure what they purport to
>> measure.
>>
>> As an alternative, you can try the MBNI re-mapped probesets, which
>> both update the mappings and remove replicate probesets (by
>> creating single probesets per gene/transcript/etc). They can be
>> obtained via biocLite, or individually here:
>>
>> http://brainarray.mbni.med.umich.edu/Brainarray/Database/CustomCDF/CDF_download.asp
>>
>>
>>
Best,
>>
>> Jim
>>
>>
>>>
>>> Thanks
>>>
>>> Dave Dr David Iles Institute for Integrative and Comparative
>>> Biology University of Leeds Leeds LS2 9JT
>>>
>>> d.e.iles at leeds.ac.uk
>>>
>>> _______________________________________________ Bioconductor
>>> mailing list Bioconductor at r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor Search the
>>> archives:
>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>>
>>>
--
>> James W. MacDonald, M.S. Biostatistician Douglas Lab University of
>> Michigan Department of Human Genetics 5912 Buhl 1241 E. Catherine
>> St. Ann Arbor MI 48109-5618 734-615-7826
>> **********************************************************
>> Electronic Mail is not secure, may not be read every day, and
>> should not be used for urgent or sensitive issues
>>
>
> _______________________________________________ Bioconductor mailing
> list Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor Search the
> archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor

--
James W. MacDonald, M.S.
Biostatistician
Douglas Lab
University of Michigan
Department of Human Genetics
5912 Buhl
1241 E. Catherine St.
Ann Arbor MI 48109-5618
734-615-7826
**********************************************************
Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues

_______________________________________________
Bioconductor mailing list
Bioconductor at r-project.org
https://stat.ethz.ch/mailman/listinfo/bioconductor
Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

_______________________________________________
Bioconductor mailing list
Bioconductor at r-project.org
https://stat.ethz.ch/mailman/listinfo/bioconductor
Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor



More information about the Bioconductor mailing list