[BioC] GRanges setdiff With Features on Both Strands

Cook, Malcolm MEC at stowers.org
Mon Jan 23 23:13:11 CET 2012


I'm sold.  Excellent compromise.  Thanks for considering my opinion.....

Cheers!

~Malcolm


> -----Original Message-----
> From: Hervé Pagès [mailto:hpages at fhcrc.org]
> Sent: Monday, January 23, 2012 2:33 PM
> To: Michael Lawrence
> Cc: Cook, Malcolm; D.Strbenac at garvan.org.au; bioconductor at r-project.org
> Subject: Re: [BioC] GRanges setdiff With Features on Both Strands
> 
> Hi Michael, Malcolm,
> 
> On 01/23/2012 11:14 AM, Michael Lawrence wrote:
> >
> >
> > On Mon, Jan 23, 2012 at 7:12 AM, Cook, Malcolm <MEC at stowers.org
> > <mailto:MEC at stowers.org>> wrote:
> >
> >      > > As for the weird behavior you get with ignore.strand=TRUE, I agree
> >      > > it doesn't make much sense and it's not totally clear to me
> >     what the
> >      > > correct behavior should be. Sadly the man page doesn't say anything
> >      > > about what the intended behavior is (there is only a brief note
> >     about
> >      > > what ignore.strand=TRUE does for parallel set operations), gives no
> >      > > example, and this case is not covered by the unit tests either.
> >      > >
> >      > > However, by looking at what union() and gaps() do with
> >      > > ignore.strand=TRUE, and also at the internals of the GenomicRanges
> >      > > package, it seems to me that the intention was that:
> >      > >
> >      > >  union(x, y, ignore.strand=TRUE)
> >      > >  intersect(x, y, ignore.strand=TRUE)
> >      > >  setdiff(x, y, ignore.strand=TRUE)
> >      > >
> >      > > would be equivalent to:
> >      > >
> >      > >  union(`strand<-`(x, "+"), `strand<-`(y, "+"))
> >      > >  intersect(`strand<-`(x, "+"), `strand<-`(y, "+"))
> >      > >  setdiff(`strand<-`(x, "+"), `strand<-`(y, "+"))
> >      > >
> >      > > So I could fix setdiff() and intersect() to do this
> >     (intersect() has
> >      > > a weird behavior too), and update the documentation. Does that
> >     sound
> >      > > reasonable?
> >      > >
> >      > >
> >      > This would be reasonable to me.
> >
> >     I think that retaining `ignore.strand` with the proposed semantics
> >     of (potentially) altering the strand is inviting further confusion.
> >
> >     For example, it would introduce the possibility that the union of x
> >     with itself is not all.equal to itself only in the case when the
> >     strand of x is not '+'.
> >
> >     I think this is a markedly poor outcome since `ignore.strand` is
> >     clearly NOT what the operation would be doing.
> >
> >
> > I would expect that a set operation ignoring strand would generate
> > ranges with "*" as the strand. There's no way to reliably preserve the
> > strand. So I don't see how the original strand would matter.
> 
> I think that Malcom's point is that it's an unpleasant surprise that
> some very fundamental and intuitive mathematical properties are not
> honored, e.g. the union (or intersection) of x with itself is not
> itself. I can live with that one though: this happens only if you
> use ignore.strand=TRUE, which is not the default. And we still have
> the property that the union (or intersection) of x with itself is
> itself *modulo* the strand. Probably good enough.
> 
> Now whether
> 
>    union(x, y, ignore.strand=TRUE)
>    intersect(x, y, ignore.strand=TRUE)
>    setdiff(x, y, ignore.strand=TRUE)
> 
> should be equivalent to
> 
>    union(`strand<-`(x, "+"), `strand<-`(y, "+"))
>    intersect(`strand<-`(x, "+"), `strand<-`(y, "+"))
>    setdiff(`strand<-`(x, "+"), `strand<-`(y, "+"))
> 
> or to
> 
>    union(`strand<-`(x, "*"), `strand<-`(y, "*"))
>    intersect(`strand<-`(x, "*"), `strand<-`(y, "*"))
>    setdiff(`strand<-`(x, "*"), `strand<-`(y, "*"))
> 
> is just a cosmetic question but I agree that the output of the latter
> will look prettier (and might be slightly less confusing). So I can do
> this. Does anybody object?
> 
> >
> >      > > Finally I'd like to mention that I see very little value in
> >     supporting
> >      > > the 'ignore.strand' arg in those 3 methods so another option
> >     would be
> >      > > to just get rid of it.
> >      > >
> >      > >
> >      > I think there is value here. Often, one wants to ignore the
> >     strand in these
> >      > operations (say between reads and transcripts) without destroying the
> >      > strand information. Creating temporary objects is a nuisance.
> >
> >     Perhaps a compromise then.....
> >
> >     Since the implementation appears to have been undocumented and the
> >     intention unclear, how about replace `ignore.strand` option with a
> >     `with.strand=c('+','-','*')` option which sets the GRanges strand
> >     prior to performing the operation.
> >
> >
> >
> > I would prefer keeping consistent with the rest of the API and having
> > ignore.strand simply be the equivalent of with.strand="*".  I
> > acknowledge that the meaning of "*" is ambiguous, but it's easily
> > preferred over "+" and "-".
> >
> 
> Yes a lot of methods already have the 'ignore.strand' arg so if we're
> going to keep that feature for union(), intersect() and setdiff(), I'd
> rather stick to that API too.
> 
> Cheers,
> H.



More information about the Bioconductor mailing list