[BioC] R: R: R: how to find the VALIDATED pair (miRNA, gene-3'UTR-sequence)
michael watson (IAH-C)
michael.watson at bbsrc.ac.uk
Sun Jun 28 09:16:49 CEST 2009
To get those you will need to download the mature.fa.gz and maturestar.fa.gz files from the miRBase ftp site: ftp://ftp.sanger.ac.uk/pub/mirbase/sequences/CURRENT/. Then you can unzip them and do a grep to find the hsa miRs.
They'll be in fasta format, and whether or not Bioconductor can read them in I have no idea - I use Bioperl for all my sequence handling.
Mick
-----Original Message-----
From: bioconductor-bounces at stat.math.ethz.ch on behalf of mauede at alice.it
Sent: Sun 28/06/2009 2:05 AM
To: Sean Davis
Cc: bioconductor List
Subject: [BioC] R: R: R: how to find the VALIDATED pair (miRNA,gene-3'UTR-sequence)
It is true I received a number of answers providing examples of data extraction from Ensembl.
However, none of them extracts any identifier contained in file "maturestar"
(ex. >hsa-let-7d* MIMAT0004484 Homo sapiens let-7d*
CUAUACGACCUGCUGCCUUUCU)
or in file "mature"
(ex. >hsa-miR-30a MIMAT0000087 Homo sapiens miR-30a
UGUAAACAUCCUCGACUGGAAG)
or in file "/hsa.gff"
All the three above mentioned files contain the miRNA identifier and some other identifier that I do not know what it is.
You may ask me why I haven't try to get all possible attribute values from Ensembl to check if some relationship can be found
I anticipate my answer:
> library(biomaRt)
> hmart <- useMart('ensembl', dataset='hsapiens_gene_ensembl')
Error in value[[3L]](cond) :
Request to BioMart web service failed. Verify if you are still connected to the internet. Alternatively the BioMart web service is temporarily down.
In fact I tried to ping server "www.biomart.org" and it did not work. ...
I deduce the server is really down at the moment.
Anyway, I do not know if either file above mentioned contains validated miRNAs.
Best regards,
Maura
-----Messaggio originale-----
Da: Sean Davis [mailto:seandavi at gmail.com]
Inviato: sab 27/06/2009 14.23
A: mauede at alice.it
Cc: Steve Lianoglou; bioconductor List
Oggetto: Re: [BioC] R: R: how to find the VALIDATED pair (miRNA, gene-3'UTR-sequence)
On Sat, Jun 27, 2009 at 1:42 AM, <mauede at alice.it> wrote:
> What is the attribute correspondent to the miR name (ex. "hsa-miR-130a") ?
Hi, Maura.
This information does not exist directly via biomaRt. You can use the
listAttributes() function to see what attributes are available if you are
ever in doubt.
>
>
> I have to link the gene information (actually right now I am only intrested
> to the 3'UTR sequence) to the miRNA for which the gene in question is a
> target.
This question has been answered several times for you. You'll want to try
those suggestions. At the bottom of emails to this list, you will find a
link to search the archives in case you didn't save the emails sent to you
earlier.
Sean
>
> -----Messaggio originale-----
> Da: Steve Lianoglou [mailto:mailinglist.honeypot at gmail.com]
> Inviato: gio 25/06/2009 16.02
> A: mauede at alice.it
> Cc: bioconductor List
> Oggetto: Re: [BioC] R: how to find the VALIDATED pair (miRNA,
> gene-3'UTR-sequence)
>
> One more thing to add:
>
> >> Similarity hsa-miR-130a miRanda miRNA_target 2 120825363
> 120825385
> >> + . 16.5359 1.687830e-02 ENST00000295228 INHBB
>
> > R> library(biomaRt)
> > R> hmart <- useMart('ensembl', dataset='hsapiens_gene_ensembl')
> > R> refseqs <-
> > c
> > ("NM_000757
> > ","NM_000757
> > ","NM_005461","NM_005924","NM_005924","NM_005924","NM_019102")
> > R> gene.map <- getBM(attributes=c('hgnc_symbol', 'ensembl_gene_id',
> > 'ensembl_transcript_id','refseq_dna'), filters='refseq_dna',
> > value=refseqs, mart=hmart)
> >
> > R> gene.map
> > hgnc_symbol ensembl_gene_id ensembl_transcript_id refseq_dna
> > 1 CSF1 ENSG00000184371 ENST00000369802 NM_000757
> > 2 MAFB ENSG00000204103 ENST00000396967 NM_005461
> > 3 MEOX2 ENSG00000106511 ENST00000262041 NM_005924
> > 4 HOXA5 ENSG00000106004 ENST00000222726 NM_019102
>
>
> Your original ensembl transcript wasn't included in our result, so
> instead of telling the `getBM` function to use a list of refseq IDs to
> get info for, we can flip this around and find out what refseq ID your
> "ENST00000295228" transcript points to. Using the same `hmart` object,
> you can do it like so:
>
> R> getBM(attributes=c('hgnc_symbol', 'ensembl_gene_id',
> 'ensembl_transcript_id','refseq_dna'),
> filters='ensembl_transcript_id', value='ENST00000295228', mart=hmart)
>
> hgnc_symbol ensembl_gene_id ensembl_transcript_id refseq_dna
> 1 INHBB ENSG00000163083 ENST00000295228 NM_002193
>
> Note we just had to change the type of ID we are passing to the
> `filters` parameter.
>
> -steve
>
> --
> Steve Lianoglou
> Graduate Student: Physiology, Biophysics and Systems Biology
> Weill Medical College of Cornell University
>
> Contact Info: http://cbio.mskcc.org/~lianos/contact<http://cbio.mskcc.org/%7Elianos/contact>
>
>
>
>
>
>
>
> tutti i telefonini TIM!
>
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
tutti i telefonini TIM!
[[alternative HTML version deleted]]
_______________________________________________
Bioconductor mailing list
Bioconductor at stat.math.ethz.ch
https://stat.ethz.ch/mailman/listinfo/bioconductor
Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
More information about the Bioconductor
mailing list