[BioC] function precede() not working with GRanges

Hervé Pagès hpages at fhcrc.org
Wed Jan 12 11:39:49 CET 2011


Hi Steve,

On 01/11/2011 07:20 PM, Steve Lianoglou wrote:
> Hi,
>
> I'm not Michael, but:
>
>> How general is the "peaks do not have a direction" statement?
>> In my experience (RNA-seq experiments), peaks are "stranded features"
>> i.e. they belong to a particular strand.
>
> I'm actually a bit surprised by that. It was my impression that most
> of the rna-seq data to date have been done with a protocol that
> doesn't honor the strand of the reads.

Just to clarify, I was not talking about protocols that honor or not
the original strand of the reads. I was talking about the strand to
which each read aligns. This information is stored in the SAM/BAM file
and propagates to the GRanges object you get when loading this file
(with something like 'as(read.GappedAlignments(myfile), "GRanges")'.
The strand information is here whatever type of *-seq experiment it
is (RNA-seq, CHIP-seq, ...)

>
> Recently, I've come across several protocols where strandedness is
> maintained and I'm sure that these will be increasingly the norm as
> time goes on.
>
> I've been working on/with different variants of some (rna)
> tag-sequencing protocols, and these do typically keep strand info, but
> I think the "old" Illumina-driven transcriptome-wide rna-seq data that
> "most" people think of as RNA-seq is unstranded.
>
> Still. I like that GenomicRanges can honor strandedness in
> overlap/etc. functions when the strand is set.
>
>>> One thing that often confuses me with GenomicRanges is the dual meaning of
>>> strand. It seems to indicate both a direction and a coordinate. It makes
>>> sense to me to have resize() and precede() consider strand as a direction.
>>> But findOverlaps() treats strands as coordinates such that items on
>>> different strands do not overlap. This makes less sense.
>>
>> Why? IMO a positive read (i.e. a read that was aligned to the plus
>> strand) shouldn't hit features (genes/transcripts/exons) that are
>> located on the minus strand. If for whatever reason this is not what
>> the user wants, then s/he can always set the strand of the query and
>> subject to '*' but I wouldn't say this is the typical usecase.
>>
>>> Another good
>>> example is gaps() which will yield sequence-long ranges on the strands not
>>> represented in the original object. This is very often undesirable. But of
>>> course my perspective is limited.
>>
>> I agree that the behaviour of gaps() on GRanges is awkward. You not only
>> get those sequence-long ranges on the strands not represented in the
>> original object but you also get them for sequences not represented at
>> all (as long as they are listed in levels(seqnames(x))). But if gaps()
>> wasn't doing this, gaps() wouldn't be reversible anymore (i.e.
>> 'gaps(gaps(x))' wouldn't be indentical to 'x') which is kind of an
>> important property for gaps(). In particular, gaps(x) would give
>> the same result whether 'x' as a sequence-long range on one strand
>> or not (very unlikely to happen with real data though).
>>
>> Maybe an extra arg to gaps() would help control this? Suggestions are
>> welcome. I'll make the change. Thanks!
>
> I'd be in favor of adding a param to GRanges::gaps() that would allow
> me to explicitly turn off this "feature" ;-)

OK, will see what I can do. Thanks for your feedback!

H.

>
> -steve
>


-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fhcrc.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319



More information about the Bioconductor mailing list