[BioC] biomaRt getSequence through genomic position

Sean Davis sdavis2 at mail.nih.gov
Wed Dec 3 12:20:55 CET 2008


On Tue, Dec 2, 2008 at 11:18 PM,  <steffen at stat.berkeley.edu> wrote:
> Hi Paul,
>
> To retrieve sequences with biomaRt and mysql=TRUE, the package actually
> connects to two BioMarts one is Ensembl and the other is the sequence
> BioMart.  However the user only needs to connect to the Ensembl BioMart.
> Under the hood getSequence will also connect to the sequence BioMart.  It
> looks like it doesn't disconnect and this causes the error when you apply
> this in a loop.  I'll try to provide a fix as soon as possible.
>
> Unfortunately it is not possible to retrieve genomic sequences with mysql=F.
> We need to discuss with the Ensembl developers and ask them if they could
> make this available through their BioMart web service.
>
> Cheers,
> Steffen
>
>> Dear Paul,
>>
>> and what is the output of sessionInfo()?
>>
>>   bw Wolfgang
>>
>> Paul Hammer ha scritto:
>>> hi all,
>>>
>>> i try to get sequences via the getSequence function from biomaRt. Exact
>>> i would like to have the last 5 bases of an exon and the last 5 bases of
>>> the following intron. my approach is following:
>>>
>>> library(biomaRt)
>>> ensembl_rat = useMart("ensembl", dataset="rnorvegicus_gene_ensembl")
>>> filter_rat = listFilters(ensembl_rat)
>>> rat_exonsLocs = getBM(attributes=c("ensembl_exon_id",
>>> "exon_chrom_start", "exon_chrom_end"), filter=filter_rat[c(14,45,12),1],
>>> values=list(chromosome="1", status="KNOWN", biotype="protein_coding"),
>>> mart=ensembl_rat)
>>> laenge = dim(rat_exonsLocs)[1]
>>>
>>> ensembl_rat2 = useMart("ensembl", dataset="rnorvegicus_gene_ensembl",
>>> mysql=TRUE)
>>> for(i in 1:laenge){
>>> gseqs_exon = getSequence(chromosome = 1, start=rat_exonsLocs[i,3]-5, end
>>> = rat_exonsLocs[i,3], mart = ensembl_rat2)
>>> seqs_introns = getSequence(chromosome = 1, start=rat_exonsLocs[i+1,2]-5,
>>> end=rat_exonsLocs[i+1,2], mart = ensembl_rat2)
>>> }
>>>
>>> but i get always this error message: "Error in mysqlNewConnection(drv,
>>> ...) : RS-DBI driver: (??O?cannot allocate a new connection -- maximum
>>> of 16 connections already opened)"
>>>
>>> Is there a way to use useMart without mysql=TRUE to get sequences only
>>> via genomic position? when i connect without mysql=TRUE
>>> (useMart("ensembl", dataset="rnorvegicus_gene_ensembl") ) i always have
>>> to set seqType and type. when i do this i don't get the 5 bases that i
>>> want!

Just an FYI, genomic sequence is also available via the BSgenome
package and associate data packages.  Install that package, load it,
and then issue the available.genomes() command.  This will list the
available genomes.  I imagine that rnorvegicus is one of them.
Install and load that package, also.  Then follow the BSgenome
vignette to get the sequences.

Sean



More information about the Bioconductor mailing list