[BioC] BSgenome.Mmulatta.UCSC

Fri Nov 29 21:01:57 CET 2013

On 11/29/2013 11:18 AM, Brian Smith wrote:
> Thanks Martin. This is what I get:
>
> =========
>  > library(AnnotationHub)
>  >   hub = AnnotationHub()
>  >   Mmulatta2 =
> hub$ensembl.release.72.fasta.macaca_mulatta.dna.Macaca_mulatta.MMUL_1.72.dna.toplevel.fa.rz
>  >   getSeq(Mmulatta2, GRanges("1", IRanges(567089, width=1)))  #A DNAStringSet
> instance of length 1
>    A DNAStringSet instance of length 1
>      width seq
>                    names
> [1]     1 G
>                    1
> =========
>
> However, if I check against UCSC for this position, it gives a 'G' for rhemac2
> (2006), but a 'T' for the rhemac3 (2010). So, is this still giving me the old
> (2006) assembly?

as the path suggests, the genome comes from Ensembl; a little work leads to

   http://www.ensembl.org/Macaca_mulatta/Info/Annotation#assembly

where we're told that this is

Assembly:	MMUL 1.0, Feb 2006

So I guess my original suggestion to use AnnotationHub wasn't helpful in this 
case. Sorry about that, but maybe not all for nothing...

Round 2:

download & uncompress

wget http://hgdownload-test.cse.ucsc.edu/goldenPath/rheMac3/bigZips/rheMac3.fa.gz
gunzip rheMac3.fa.gz

(you could do the above entirely in R with download.file() and 
R.utils::gunzip()) In R re-compress as razip and index (these are both 
relatively lengthy, but need to be done only once).

   library(Rsamtools)
   razip("rheMac3.fa")
   indexFa("rheMac3.fa.rz")

use

   fa = FaFile("rheMac3.fa.rz")
   getSeq(fa, GRanges("1", IRanges(567089, width=1)))

Hopefully you want more than just the 'T'!

Martin

>
> thanks!
>
>
>
>
> On Thu, Nov 28, 2013 at 11:52 AM, Martin Morgan <mtmorgan at fhcrc.org
> <mailto:mtmorgan at fhcrc.org>> wrote:
>
>     On 11/28/2013 07:20 AM, Brian Smith wrote:
>
>         Hi Martin,
>
>         Thanks for the reply!
>
>         I get the following:
>
>         ------------------------------__--------------------
>           > library(AnnotationHub)
>           > hub = AnnotationHub()
>           > Mmulatta2 =
>         hub$ensembl.release.73.fasta.__macaca_mulatta.dna.Macaca___mulatta.MMUL_1.73.dna.__toplevel.fa.rz
>         Warning message:
>         In .getResource(x, name) : incomplete path
>
>
>     probably you are using a version of R for which this resource is not
>     available. There is tab completion, and once you get to
>
>     hub$ensembl.release.7
>
>     press the tab key. Likely you'll see
>
>      > hub$ensembl.release.7
>     hub$ensembl.release.70. ... [427]  hub$ensembl.release.72. ... [393]
>     hub$ensembl.release.71. ... [426]
>
>     and you can complete to
>
>     Mmullata2 =
>     hub$ensembl.release.72.fasta.__macaca_mulatta.dna.Macaca___mulatta.MMUL_1.72.dna.__toplevel.fa.rz
>
>     Print Mmullata2 to the console to ensure that it is an 'FaFile' object.
>
>
>           > getSeq(Mmulatta2, GRanges("1", IRanges(567089, width=1)))  #A
>         DNAStringSet
>         instance of length 1
>         Error in (function (classes, fdef, mtable)  :
>             unable to find an inherited method for function ‘getSeq’ for signature
>         ‘"character"’
>
>         ------------------------------__--------------------
>
>         Am I doing something wrong, or do I need to install another package?
>
>         thanks!
>
>
>         On Wed, Nov 27, 2013 at 2:33 PM, Martin Morgan <mtmorgan at fhcrc.org
>         <mailto:mtmorgan at fhcrc.org>
>         <mailto:mtmorgan at fhcrc.org <mailto:mtmorgan at fhcrc.org>>> wrote:
>
>              On 11/27/2013 08:58 AM, Brian Smith wrote:
>
>                  Hi,
>
>                  I wanted to use the Mmulatta genome in bioconductor, but using
>                  "available.genomes()", I see that only the rheMac2 (from 2006) is
>                  available.
>
>                  UCSC also shows rheMac3 (from 2010). Is there a way that I can
>                  download/incorporate this?
>
>                  Essentially, I want to find the nucleotide at specific
>         positions in the
>                  rhesus genome (e.g. chr1 - 567089).
>
>
>              Depending on what you're actually interested in,
>
>                 library(AnnotationHub)
>                 hub = AnnotationHub()
>                 Mmulatta =
>
>         hub$ensembl.release.73.fasta.____macaca_mulatta.dna.Macaca_____mulatta.MMUL_1.73.dna.____toplevel.fa.rz
>
>
>              and then
>
>               > getSeq(Mmulatta, GRanges("1", IRanges(567089, width=1)))  A
>         DNAStringSet
>              instance of length 1
>                   width seq                                               names
>              [1]     1 G                                                 1
>
>              Martin
>
>
>                  thanks!!
>
>                           [[alternative HTML version deleted]]
>
>                  ___________________________________________________
>                  Bioconductor mailing list
>         Bioconductor at r-project.org <mailto:Bioconductor at r-project.org>
>         <mailto:Bioconductor at r-__project.org <mailto:Bioconductor at r-project.org>>
>         https://stat.ethz.ch/mailman/____listinfo/bioconductor
>         <https://stat.ethz.ch/mailman/__listinfo/bioconductor>
>
>                  <https://stat.ethz.ch/mailman/__listinfo/bioconductor
>         <https://stat.ethz.ch/mailman/listinfo/bioconductor>>
>                  Search the archives:
>         http://news.gmane.org/gmane.____science.biology.informatics.____conductor <http://news.gmane.org/gmane.__science.biology.informatics.__conductor>
>
>
>         <http://news.gmane.org/gmane.__science.biology.informatics.__conductor
>         <http://news.gmane.org/gmane.science.biology.informatics.conductor>>
>
>
>
>              --
>              Computational Biology / Fred Hutchinson Cancer Research Center
>              1100 Fairview Ave. N.
>              PO Box 19024 Seattle, WA 98109
>
>              Location: Arnold Building M1 B861
>              Phone: (206) 667-2793 <tel:%28206%29%20667-2793>
>         <tel:%28206%29%20667-2793>
>
>
>
>
>     --
>     Computational Biology / Fred Hutchinson Cancer Research Center
>     1100 Fairview Ave. N.
>     PO Box 19024 Seattle, WA 98109
>
>     Location: Arnold Building M1 B861
>     Phone: (206) 667-2793 <tel:%28206%29%20667-2793>
>
>

-- 
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793