[BioC] views on Rle using GRanges object
Hervé Pagès
hpages at fhcrc.org
Mon Mar 31 19:09:29 CEST 2014
Hi Michael,
On 03/31/2014 05:22 AM, Michael Lawrence wrote:
> Raising this one from the dead: this would be a very nice thing to have.
> It's tough explaining to users how to bridge GRanges and RangesList for
> aggregating vectors (RleLists) by GRanges. We could start with
> rangeSums, rangeMeans, rangeMaxs, etc, that directly summarize and
> RleList, and then maybe move towards a GenomicViews object. Maybe
> something for the next release cycle?
Yeah I'd like to start working on something like this. Note that, kind
of related to this, I added subsetting a named List by a GRanges
subscript recently. But what we really need is a simple container that
bundles a GRanges object with a named List subject, I think.
H.
>
> Michael
>
>
> On Wed, Aug 17, 2011 at 5:55 AM, Michael Lawrence <michafla at gene.com
> <mailto:michafla at gene.com>> wrote:
>
>
>
> 2011/8/16 Hervé Pagès <hpages at fhcrc.org <mailto:hpages at fhcrc.org>>
>
> Michael,
>
> On 11-08-16 10:52 AM, Michael Lawrence wrote:
>
>
>
> 2011/8/16 Hervé Pagès <hpages at fhcrc.org
> <mailto:hpages at fhcrc.org> <mailto:hpages at fhcrc.org
> <mailto:hpages at fhcrc.org>>>
>
>
> Hi Michael,
>
>
> On 11-08-15 10:28 PM, Michael Lawrence wrote:
>
> On Mon, Aug 15, 2011 at 10:10 PM, Michael
> Lawrence<michafla at gene.com
> <mailto:michafla at gene.com> <mailto:michafla at gene.com
> <mailto:michafla at gene.com>>>____wrote:
>
>
>
>
> On Mon, Aug 15, 2011 at 5:25 PM, Janet
> Young<jayoung at fhcrc.org
> <mailto:jayoung at fhcrc.org> <mailto:jayoung at fhcrc.org
> <mailto:jayoung at fhcrc.org>>> wrote:
>
>
> Hi again,
>
> I have another question, in the "I think I
> need a
> convoluted workaround,
> but maybe I'm missing a simple solution"
> genre (seems
> like most of my
> questions are like that).
>
> I have an RleList object representing
> mapability for the
> whole human
> genome. I'd like to:
> (a) calculate viewMeans for various regions
> of interest
> that I've been
> representing as GRanges, and
> (b) I'd like the underlying code to be smart
> and match
> chromosome names up
> in the RleList and the GRanges object (not
> rely on
> chromosomes being ordered
> the same in the two objects), and
> (c) I'd like to return the viewMeans results
> in the same
> order as the
> objects in my original GRanges
>
> I don't think this is currently implemented
> without
> doing several
> coercions (that introduce their own
> complications) but
> I'm not sure. Some
> code is below that shows what I'm trying to
> do. It
> seems like it might be
> a common kind of way to use viewMeans - I
> imagine people
> are gradually
> switching to use GRanges more and more? but
> really I
> don't know.
>
> My real world question is that I've read in
> mapability
> from a UCSC bigWig
> file, and made that into an RleList. I have
> a bunch of
> other regions I've
> read in (again from UCSC) using rtracklayer
> as GRanges,
> and have annotated
> those with various things I'm interested in
> (e.g. number
> of RNAseq reads in
> the region). I want to add the average
> mapability for
> each region, so that
> I can later look at how mapability affects
> those other
> things I'm
> annotating.
>
> Should I be being more strict with myself
> about how my
> GRanges are ordered
> and making sure they all have the same
> seqlevels and
> seqlengths? Perhaps
> that would help the coercion workarounds go
> more smoothly.
>
> thanks,
>
> Janet
>
>
> library(GenomicRanges)
>
> ### make an RleList. Just for this example,
> I starting
> with a GRanges
> object and used coverage to get an RleList
> example. To
> get my real data I
> read in a UCSC bigWig file.
>
> fakeRegions<-
> GRanges(seqnames=c("chrA","____chrA",
> "chrB", "chrC"),
> ranges=IRanges(start=c(1,51,1,____1),
> end=c(60,90,20,10) ) )
> seqlengths(fakeRegions)<- c(100,100,100)
> myRleList<- coverage(fakeRegions)
> rm(fakeRegions)
>
> myRleList
>
> # SimpleRleList of length 3
> # $chrA
> # 'integer' Rle of length 100 with 4 runs
> # Lengths: 50 10 30 10
> # Values : 1 2 1 0
> #
> # $chrB
> # 'integer' Rle of length 100 with 2 runs
> # Lengths: 20 80
> # Values : 1 0
> #
> # $chrC
> # 'integer' Rle of length 100 with 2 runs
> # Lengths: 10 90
> # Values : 1 0
>
> ### make some regions of interest
> myRegions<- GRanges(seqnames=c("chrB",
> "chrC", "chrB"),
> ranges=IRanges(start=c(1,1,5),
> end=c(20,20,10) ),
> geneNames=c("g1","g2","g3") )
> myRegions
> # GRanges with 3 ranges and 1
> elementMetadata value
> # seqnames ranges strand | geneNames
> #<Rle> <IRanges> <Rle> |<character>
> # [1] chrB [1, 20] * | g1
> # [2] chrC [1, 20] * | g2
> # [3] chrB [5, 10] * | g3
> #
> # seqlengths
> # chrB chrC
> # NA NA
>
> ## can't use GRanges directly
> Views( myRleList, myRegions)
> # Error in RleViewsList(rleList = subject,
> rangesList =
> start) :
> # 'rangesList' must be a RangesList object
>
> ## can't use a simple coercion
> Views( myRleList, as(myRegions,"RangesList") )
> # Error in .Method(..., FUN = FUN, MoreArgs
> = MoreArgs,
> SIMPLIFY =
> SIMPLIFY, :
> # all object lengths must be multiple of
> longest
> object length
>
> ### although I can use that coercion if I
> first give the
> GRanges object
> the same levels as the RleList to force the
> lists to
> have the same names as
> each other:
> seqlevels(myRegions)<- names(myRleList)
> viewMeans(Views( myRleList,
> as(myRegions,"RangesList") ) )
> # SimpleNumericList of length 3
> # [["chrA"]] numeric(0)
> # [["chrB"]] 1 1
> # [["chrC"]] 0.5
>
>
> These two lines seem pretty concise to me. It
> makes sense to
> layer a
> RangesList on top of an RleList. Storing the
> result would of
> course be
> easier if you had sorted the GRanges by the
> seqlevels.
> Having a utility for
> that would be nice (orderBySeqlevels?). It would
> also be
> easy with
> RangedData, which enforces ordering by space.
>
>
> Just to clarify, here is how the function would work:
>
>
> values(myRegions)$meanCov[____orderBySeqlevels(myRegions)]<-
> unlist(viewMeans(Views( myRleList,
> as(myRegions,"RangesList") ) ),
> use.names=FALSE)
>
> So 'orderBySeqlevels' would be a nice bridge back
> into the flat
> GRanges
> world from the Listy world of IRanges.
>
>
> Well I'm just remembering now that we still don't have
> an "order"
> method for GRanges objects like we do for IRanges
> object. The "natural"
> order for a GRanges object would be to order first by
> (a) seqlevel,
> (b) then by strand, (c) then by start, (d) and finally
> by end.
>
> We already do (c) and (d) for IRanges.
>
> Also the "reduce" method for GRanges already uses this
> "natural"
> order implicitly.
>
> So we will add this "order" method and the other basic
> order-related
> methods (sort, >=, <=, ==, !=, unique, duplicated, they
> are all
> missing) for GRanges objects. That will cover Janet's
> use case and
> other user cases (I think someone asked how to find
> duplicated in
> a GRanges object a while ago on this list).
>
>
> An "order" method would be great. Note though that the
> coercion to
> RangesList only sorts by seqlevels, so we would need
> something that gave
> out that order vector. The point here is to avoid forcing
> the user to
> modify the order of the original GRanges.
>
>
> With order() the user will still be able to achieve this with
> something
> like:
>
> regionMeans <- function(regions, cvg)
> {
> seqlevels(regions) <- names(cvg)
> oo <- order(regions)
> regions <- regions[oo]
> ans <- unlist(
> viewMeans(
> Views(cvg, as(regions, "RangesList"))
> ), use.names=FALSE)
> ans[oo] <- ans # restore original order
> ans
> }
>
> values(myRegions)$meanCov <- regionMeans(myRegions, myRleList)
>
> Tricky :-/
>
> But maybe we could provide a "mean" method that accepts 2 args:
> a GRanges and an RleList/IntegerList/__NumericList. Same with
> min, max, sum etc... they would all return a numeric vector of the
> same length as their first argument (the GRanges object).
>
>
> Patrick and I had talked about this. We were calling it the
> rangeMeans function. It would go along with rangeMins, rangeMaxs,
> etc. Basically a way to perform a windowed computation without
> having to construct an intermediate Views. This is a lot easier to
> design and implement, as you were saying below.
>
> There are a lot of other (non-Rle) Vectors that would benefit from
> window-based summaries. These include the basic vector types, as
> well as ranges themselves. For example, I and others have
> encountered a need to find the nearest neighbors of a set of ranges,
> with the constraint that they fall within the same gene or exon or
> whatever. One could come up with a GRanges with the seqnames
> corresponding to a gene, but that's a hack and the current looping
> implementation of nearest,GRanges is a bit slow.
>
> Michael
>
> Another approach could be that we make the Views() constructor more
> flexible so Views(myRleList, myRegions) would just work and return
> a Views object (not a ViewsList) but that means redefining what a
> Views object can be (@subject can now be a named List and @ranges
> a GRanges object). Maybe a cleaner design would be to use a new
> container for this e.g. GViews (for Genomic Views)...
>
> H.
>
>
>
> Michael
>
> Cheers,
> H.
>
>
>
> Michael
>
>
>
> ### getting close to a solution - but I'd
> have liked to
> have been able to
> unlist this and directly add to my GRanges
> object e.g.
> values(myRegions)$meanCov<-
> unlist(viewMeans(Views(
> myRleList,
> as(myRegions,"RangesList") ) ), use.names=FALSE)
> ### but the regions are in a different order
> to how I
> started, so the
> above command would associate the wrong
> scores with the
> wrong regions.
>
> sessionInfo()
>
> R version 2.13.1 (2011-07-08)
> Platform: i386-apple-darwin9.8.0/i386 (32-bit)
>
> locale:
> [1]
> en_US.UTF-8/en_US.UTF-8/C/C/____en_US.UTF-8/en_US.UTF-8
>
> attached base packages:
> [1] stats graphics grDevices utils
> datasets
> methods base
>
> other attached packages:
> [1] GenomicRanges_1.4.6 IRanges_1.10.5
>
>
> ___________________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org <mailto:Bioconductor at r-project.org>
> <mailto:Bioconductor at r-__project.org
> <mailto:Bioconductor at r-project.org>>
>
> https://stat.ethz.ch/mailman/____listinfo/bioconductor
> <https://stat.ethz.ch/mailman/__listinfo/bioconductor>
>
> <https://stat.ethz.ch/mailman/__listinfo/bioconductor
> <https://stat.ethz.ch/mailman/listinfo/bioconductor>>
> Search the archives:
> http://news.gmane.org/gmane.____science.biology.informatics.____conductor
> <http://news.gmane.org/gmane.__science.biology.informatics.__conductor>
>
> <http://news.gmane.org/gmane.__science.biology.informatics.__conductor <http://news.gmane.org/gmane.science.biology.informatics.conductor>>
>
>
>
>
> [[alternative HTML version deleted]]
>
>
> ___________________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> <mailto:Bioconductor at r-project.org>
> <mailto:Bioconductor at r-__project.org
> <mailto:Bioconductor at r-project.org>>
>
> https://stat.ethz.ch/mailman/____listinfo/bioconductor
> <https://stat.ethz.ch/mailman/__listinfo/bioconductor>
>
> <https://stat.ethz.ch/mailman/__listinfo/bioconductor
> <https://stat.ethz.ch/mailman/listinfo/bioconductor>>
> Search the archives:
> http://news.gmane.org/gmane.____science.biology.informatics.____conductor
> <http://news.gmane.org/gmane.__science.biology.informatics.__conductor>
>
> <http://news.gmane.org/gmane.__science.biology.informatics.__conductor <http://news.gmane.org/gmane.science.biology.informatics.conductor>>
>
>
>
> --
> Hervé Pagès
>
> Program in Computational Biology
> Division of Public Health Sciences
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N, M1-B514
> P.O. Box 19024
> Seattle, WA 98109-1024
>
> E-mail: hpages at fhcrc.org <mailto:hpages at fhcrc.org>
> <mailto:hpages at fhcrc.org <mailto:hpages at fhcrc.org>>
> Phone: (206) 667-5791 <tel:%28206%29%20667-5791>
> <tel:%28206%29%20667-5791>
> Fax: (206) 667-1319 <tel:%28206%29%20667-1319>
> <tel:%28206%29%20667-1319>
>
>
>
>
> --
> Hervé Pagès
>
> Program in Computational Biology
> Division of Public Health Sciences
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N, M1-B514
> P.O. Box 19024
> Seattle, WA 98109-1024
>
> E-mail: hpages at fhcrc.org <mailto:hpages at fhcrc.org>
> Phone: (206) 667-5791 <tel:%28206%29%20667-5791>
> Fax: (206) 667-1319 <tel:%28206%29%20667-1319>
>
>
>
--
Hervé Pagès
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024
E-mail: hpages at fhcrc.org
Phone: (206) 667-5791
Fax: (206) 667-1319
More information about the Bioconductor
mailing list