[BioC] AffyID mapping question
Steve Lianoglou
mailinglist.honeypot at gmail.com
Mon Jul 2 17:53:32 CEST 2012
Hi Jiayi,
Side note: please CC the bioconductor list when replying to emails so
they can stay online -- you'll get better help (more eyeballs on your
problem), and the list can be used as a resource to others.
I guess this might be a pain using the "guest posting" stuff -- but
subscribing to the mailing list is easy, and you'll learn a lot by
skimming the post that come through here.
OK -- now to solver your problem:
On Mon, Jul 2, 2012 at 11:03 AM, Jiayi Hou <houj2 at vcu.edu> wrote:
> Hey Steve,
>
> Sorry let me put it this way, so when a probeset hybridized to a given gene,
> the gene has a chromosomal location in terms of base pair. For a given gene,
> on average there may be 2-3 probesets attach to the same gene. However,
> these 2-3 probesets carrying different sequence of base pairs, are expected
> to attach to the different location oin the given gene. What I am looking
> for is where precisly these probesets attach to the gene.
Thanks, that's a bit clearer now.
In the past I've done this with a little elbow grease: you can get the
probe sequence info for the chip you're using from this package:
http://bioconductor.org/packages/2.10/data/annotation/html/htmg430aprobe.html
There's a short vignette on matching probe sequences (against each
other, which isn't all that helpful for you, but can be a start) using
the Biostrings package here:
http://bioconductor.org/packages/2.10/bioc/vignettes/Biostrings/inst/doc/matchprobes.pdf
You can extend the examples there by matching your probes against the
mouse genome using the appropriate BSgenome package
(BSgenome.Mmusculus.UCSC.mm9).
Alternatively, you can follow section 4.1 of the biomaRt vignette here:
http://bioconductor.org/packages/2.10/bioc/vignettes/biomaRt/inst/doc/biomaRt.pdf
For example:
R> ensembl <- useMart("ensembl",dataset="hsapiens_gene_ensembl")
R> affyids <- c("202763_at","209310_s_at","207500_at")
R> getBM(attributes=c('affy_hg_u133_plus_2', 'hgnc_symbol',
'chromosome_name','start_position','end_position', 'band'),
filters = 'affy_hg_u133_plus_2', values = affyids, mart = ensembl)
affy_hg_u133_plus_2 hgnc_symbol chromosome_name start_position
end_position band
1 202763_at CASP3 4 185548850
185570663 q35.1
2 209310_s_at CASP4 11 104813593
104840163 q22.3
3 207500_at CASP5 11 104864962
104893895 q22.3
You'll have to change the "mart/dataset" you are using, as well as the
chip id's, but you should get the idea.
HTH,
-steve
--
Steve Lianoglou
Graduate Student: Computational Systems Biology
| Memorial Sloan-Kettering Cancer Center
| Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact
More information about the Bioconductor
mailing list