[BioC] Retrieving gene name where given genomic region is included.

Steve Lianoglou mailinglist.honeypot at gmail.com
Wed Jan 20 16:52:53 CET 2010


Hi,

On Mon, Jan 18, 2010 at 12:28 PM, Boel Brynedal <Boel.Brynedal at ki.se> wrote:
> Dear List,
>
> I have long lists of genomic regions (chr;start;end) where a given event
> has taken place. These regions can be an exon, an intronic region, or
> similar.  Most (all) of these events have taken place within the
> boundaries of genes, and I would like to retrieve the gene names
> (ensemble ID).
>
> I've tried to use biomaRt:
>>
> getBM(attributes=c("ensembl_gene_id"),filter=c("chromosome_name","start","end"),
> values=list(10,17317394,17317851), mart=ensembl)
> [1] ensembl_gene_id
> <0 rows> (or 0-length row.names)
>
> But since no whole GENE is within these boundaries, I get nothing. i've
> also tried asking for "ensembl_exon_id" when looking at exon events (not
> all of them are of that kind however), and this generally results in a
> long list of exon IDs (because one exon can be part of several transcripts).
>
> I would appreciate any ideas of how this could be done in a better way.

In addition to the GenomicFeatures package, I've also been developing
a package that can handle situations like this called
"GenomeAnnotations" for work I've been doing w/ *-seq data. It's not
available through the normal bioconductor/biocLite channels, however,
so you'd have to be comfortable installing packages from their source
in order to use it (which isn't too difficult (assuming your on
linux/os x -- I don't really have any experience with windows,
sorry)).

A sample session that shows you how you could use my package to answer
this question would look like so:

R> library(GenomeAnnotations)
R> hg18r <- GenomeDB('hg18', 'refseq')
R> genes <- getGenesOnChromosome(hg18r, 10, 17317394, 17317851,
strictly.contained=FALSE)
R> names(genes)
[1] "VIM"

Which I guess is the gene you're looking for? Note that if
"strictly.contained" was TRUE, then you would have been given an empty
list.

I have instructions on how you to download and install the base
GenomeAnnotations package here:
http://wiki.github.com/lianos/GenomeAnnotations/

And an appropriate annotation package for your
genome/annotation-source of interest here (I have ones prebuilt for
hg18 using aceview and refseq annotations, as well as hg19 w/ refseq
annos):
http://wiki.github.com/lianos/GenomeAnnotations/installing-annotation-packages

There are other examples on how to use it here:
http://wiki.github.com/lianos/GenomeAnnotations/example-usage

There is some skeletal documentation for the package using the normal
?function ways after you've installed it, but I'm working on making it
better since the package is in active development. If you end up using
it, feel free to ask questions and/or suggest on ways you'd like to so
it improved.

Hope that helps,
-steve

-- 
Steve Lianoglou
Graduate Student: Computational Systems Biology
 | Memorial Sloan-Kettering Cancer Center
 | Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact



More information about the Bioconductor mailing list