[BioC] BSgenome.Mmulatta.UCSC
Martin Morgan
mtmorgan at fhcrc.org
Fri Nov 29 21:01:57 CET 2013
On 11/29/2013 11:18 AM, Brian Smith wrote:
> Thanks Martin. This is what I get:
>
> =========
> > library(AnnotationHub)
> > hub = AnnotationHub()
> > Mmulatta2 =
> hub$ensembl.release.72.fasta.macaca_mulatta.dna.Macaca_mulatta.MMUL_1.72.dna.toplevel.fa.rz
> > getSeq(Mmulatta2, GRanges("1", IRanges(567089, width=1))) #A DNAStringSet
> instance of length 1
> A DNAStringSet instance of length 1
> width seq
> names
> [1] 1 G
> 1
> =========
>
> However, if I check against UCSC for this position, it gives a 'G' for rhemac2
> (2006), but a 'T' for the rhemac3 (2010). So, is this still giving me the old
> (2006) assembly?
as the path suggests, the genome comes from Ensembl; a little work leads to
http://www.ensembl.org/Macaca_mulatta/Info/Annotation#assembly
where we're told that this is
Assembly: MMUL 1.0, Feb 2006
So I guess my original suggestion to use AnnotationHub wasn't helpful in this
case. Sorry about that, but maybe not all for nothing...
Round 2:
download & uncompress
wget http://hgdownload-test.cse.ucsc.edu/goldenPath/rheMac3/bigZips/rheMac3.fa.gz
gunzip rheMac3.fa.gz
(you could do the above entirely in R with download.file() and
R.utils::gunzip()) In R re-compress as razip and index (these are both
relatively lengthy, but need to be done only once).
library(Rsamtools)
razip("rheMac3.fa")
indexFa("rheMac3.fa.rz")
use
fa = FaFile("rheMac3.fa.rz")
getSeq(fa, GRanges("1", IRanges(567089, width=1)))
Hopefully you want more than just the 'T'!
Martin
>
> thanks!
>
>
>
>
> On Thu, Nov 28, 2013 at 11:52 AM, Martin Morgan <mtmorgan at fhcrc.org
> <mailto:mtmorgan at fhcrc.org>> wrote:
>
> On 11/28/2013 07:20 AM, Brian Smith wrote:
>
> Hi Martin,
>
> Thanks for the reply!
>
> I get the following:
>
> ------------------------------__--------------------
> > library(AnnotationHub)
> > hub = AnnotationHub()
> > Mmulatta2 =
> hub$ensembl.release.73.fasta.__macaca_mulatta.dna.Macaca___mulatta.MMUL_1.73.dna.__toplevel.fa.rz
> Warning message:
> In .getResource(x, name) : incomplete path
>
>
> probably you are using a version of R for which this resource is not
> available. There is tab completion, and once you get to
>
> hub$ensembl.release.7
>
> press the tab key. Likely you'll see
>
> > hub$ensembl.release.7
> hub$ensembl.release.70. ... [427] hub$ensembl.release.72. ... [393]
> hub$ensembl.release.71. ... [426]
>
> and you can complete to
>
> Mmullata2 =
> hub$ensembl.release.72.fasta.__macaca_mulatta.dna.Macaca___mulatta.MMUL_1.72.dna.__toplevel.fa.rz
>
> Print Mmullata2 to the console to ensure that it is an 'FaFile' object.
>
>
> > getSeq(Mmulatta2, GRanges("1", IRanges(567089, width=1))) #A
> DNAStringSet
> instance of length 1
> Error in (function (classes, fdef, mtable) :
> unable to find an inherited method for function ‘getSeq’ for signature
> ‘"character"’
>
> ------------------------------__--------------------
>
> Am I doing something wrong, or do I need to install another package?
>
> thanks!
>
>
> On Wed, Nov 27, 2013 at 2:33 PM, Martin Morgan <mtmorgan at fhcrc.org
> <mailto:mtmorgan at fhcrc.org>
> <mailto:mtmorgan at fhcrc.org <mailto:mtmorgan at fhcrc.org>>> wrote:
>
> On 11/27/2013 08:58 AM, Brian Smith wrote:
>
> Hi,
>
> I wanted to use the Mmulatta genome in bioconductor, but using
> "available.genomes()", I see that only the rheMac2 (from 2006) is
> available.
>
> UCSC also shows rheMac3 (from 2010). Is there a way that I can
> download/incorporate this?
>
> Essentially, I want to find the nucleotide at specific
> positions in the
> rhesus genome (e.g. chr1 - 567089).
>
>
> Depending on what you're actually interested in,
>
> library(AnnotationHub)
> hub = AnnotationHub()
> Mmulatta =
>
> hub$ensembl.release.73.fasta.____macaca_mulatta.dna.Macaca_____mulatta.MMUL_1.73.dna.____toplevel.fa.rz
>
>
> and then
>
> > getSeq(Mmulatta, GRanges("1", IRanges(567089, width=1))) A
> DNAStringSet
> instance of length 1
> width seq names
> [1] 1 G 1
>
> Martin
>
>
> thanks!!
>
> [[alternative HTML version deleted]]
>
> ___________________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org <mailto:Bioconductor at r-project.org>
> <mailto:Bioconductor at r-__project.org <mailto:Bioconductor at r-project.org>>
> https://stat.ethz.ch/mailman/____listinfo/bioconductor
> <https://stat.ethz.ch/mailman/__listinfo/bioconductor>
>
> <https://stat.ethz.ch/mailman/__listinfo/bioconductor
> <https://stat.ethz.ch/mailman/listinfo/bioconductor>>
> Search the archives:
> http://news.gmane.org/gmane.____science.biology.informatics.____conductor <http://news.gmane.org/gmane.__science.biology.informatics.__conductor>
>
>
> <http://news.gmane.org/gmane.__science.biology.informatics.__conductor
> <http://news.gmane.org/gmane.science.biology.informatics.conductor>>
>
>
>
> --
> Computational Biology / Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N.
> PO Box 19024 Seattle, WA 98109
>
> Location: Arnold Building M1 B861
> Phone: (206) 667-2793 <tel:%28206%29%20667-2793>
> <tel:%28206%29%20667-2793>
>
>
>
>
> --
> Computational Biology / Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N.
> PO Box 19024 Seattle, WA 98109
>
> Location: Arnold Building M1 B861
> Phone: (206) 667-2793 <tel:%28206%29%20667-2793>
>
>
--
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109
Location: Arnold Building M1 B861
Phone: (206) 667-2793
More information about the Bioconductor
mailing list