[BioC] about source to make HGU133plus2 package
Nianhua Li
nli at fhcrc.org
Mon Apr 30 23:24:11 CEST 2007
Hi, Greg,
We first map probeset IDs to Entrez Gene IDs and then extract other
annotations
by using the Entrez Gene IDs. Here are some details for hgu133plus2
v1.16.0 (the
one in bioc2.0 release).
The ACCNUM environment (hgu133plus2ACCNUM) was extracted from Affymetrix's
annotation file (dated 11/15/2006, the "Representative Public ID"
column). It is
used to get probeset to Entrez Gene mapping via data from Entrez Gene
(GenBank
to Entrez mapping, dated 2/28/2007) and UniGene (GenBank to UniGene
mapping and
UniGene to Entrez mapping, dated 2/26/2007). For unmapped probeset
IDs, we use
the probeset to Entrez mapping in Affymetrix's annotation file as a
suppliment.
The result is the ENTREZID environment.
We then search for other annotations by using the Entrez IDs. The
table below
lists all the source data.
table columns:
---------------
Environment--name of the environment, e.g. CHRLOC represents
hgu133plus2CHRLOC
Source Name--name of the source database
Source URL--URL of the source data directory
Source Date--date of the source data
------------------------------------------------------
CHRLOC UCSC Genome Bioinformatics (Homo sapiens)
ftp://hgdownload.cse.ucsc.edu/goldenPath/currentGenomes/Homo_sapiens
2006-Apr14
CHR Entrez Gene ftp://ftp.ncbi.nlm.nih.gov/gene/DATA 2007-Feb28
ENZYME KEGG GENOME ftp://ftp.genome.jp/pub/kegg/genomes 2007-Feb28
GENENAME Entrez Gene ftp://ftp.ncbi.nlm.nih.gov/gene/DATA
2007-Feb28
GO Gene Ontology http://archive.godatabase.org/latest 200702
MAP Entrez Gene ftp://ftp.ncbi.nlm.nih.gov/gene/DATA 2007-Feb28
OMIM Entrez Gene ftp://ftp.ncbi.nlm.nih.gov/gene/DATA 2007-Feb28
PATH KEGG GENOME ftp://ftp.genome.jp/pub/kegg/genomes 2007-Feb28
PMID Entrez Gene ftp://ftp.ncbi.nlm.nih.gov/gene/DATA 2007-Feb28
REFSEQ Entrez Gene ftp://ftp.ncbi.nlm.nih.gov/gene/DATA 2007-Feb28
SYMBOL Entrez Gene ftp://ftp.ncbi.nlm.nih.gov/gene/DATA 2007-Feb28
UNIGENE Entrez Gene ftp://ftp.ncbi.nlm.nih.gov/gene/DATA 2007-Feb28
ENZYME2PROBE KEGG GENOME ftp://ftp.genome.jp/pub/kegg/genomes
2007-Feb28
GO2PROBE Entrez Gene ftp://ftp.ncbi.nlm.nih.gov/gene/DATA
2007-Feb28
GO2ALLPROBES Gene Ontology http://archive.godatabase.org/latest
200702
GO2ALLPROBES Entrez Gene ftp://ftp.ncbi.nlm.nih.gov/gene/DATA
2007-Feb28
PATH2PROBE KEGG GENOME ftp://ftp.genome.jp/pub/kegg/genomes
2007-Feb28
PFAM The International Protein Index
ftp://ftp.ebi.ac.uk/pub/databases/IPI/current 2007-Feb21
PROSITE The International Protein Index
ftp://ftp.ebi.ac.uk/pub/databases/IPI/current 2007-Feb21
PMID2PROBE Entrez Gene ftp://ftp.ncbi.nlm.nih.gov/gene/DATA
2007-Feb28
-------------------------------------------------------
hope this helps
nianhua
More information about the Bioconductor
mailing list