[BioC] how to merge GRanges objects

Hervé Pagès hpages at fhcrc.org
Wed Oct 16 19:54:18 CEST 2013


Hi John,

On 10/16/2013 07:15 AM, John linux-user wrote:
> Hello everyone,
>
> I am wondering how to simply merge two GRanges objects by range field and add the value by additional vector. For example, I have two objects below
>
> obj1
>
> seqnames           ranges strand |       Val
>              <Rle>        <IRanges>  <Rle> | <integer>
>    [1] chr1_random [272531, 272571]      + |        88
>    [2] chr1_random [272871, 272911]      + |        45
>
> obj2
>   seqnames           ranges strand |       Val
>              <Rle>        <IRanges>  <Rle> | <integer>
>    [1] chr1_random [272531, 272581]      + |        800
>    [2] chr1_random [272850, 272911]      + |        450
>
> after merged, it should be an object as the following mergedObject and it would concern the differences in IRANGE data (e.g. 581 and 850 in obj2 above were different from those of obj1, which were 571 and 871 respectively)
>
> mergedObject
>
>   seqnames           ranges strand                 |         object2Val   object1Val
>              <Rle>        <IRanges>  <Rle>         |         <integer>     <integer>
>    [1] chr1_random [272531, 272581]      + |        800               88
>    [2] chr1_random [272850, 272911]      + |        450               45
>

I fail to see how this result makes sense. If you think of the "val"
metadata column as a numerical variable defined along the genome, then,
in the merged object, it takes the value 88 over the [272531, 272581]
interval but in original object 'obj1' it was taking this value
over the [272531, 272571] interval.

The following merged object would make much more sense to me:

          seqnames          ranges  strand | object2Val object1Val
             <Rle>        <IRanges>  <Rle> |  <integer>  <integer>
   [1] chr1_random [272531, 272571]      + |        800         88
   [1] chr1_random [272572, 272581]      + |        800         NA
   [2] chr1_random [272850, 272870]      + |        450         NA
   [2] chr1_random [272871, 272911]      + |        450         45

Sounds like you need all the ranges in disjoin(union(obj1, obj2))
if you want to be able to represent the 2 numerical variables
accurately. But maybe I'm missing completely what you're trying
to achieve.

H.

>
>
>
> On Tuesday, October 15, 2013 12:36 PM, Lukasz [guest] <guest at bioconductor.org> wrote:
>
>
> Hi!
>
> Problem summary: How to retrieve part of the sequence of mRNA around given location.
>
> I have the locations of the binding to mRNA events as GRanges (GRevents) and need to retrieve sequence for motif finding. The problem is that if I use getSeq(flank(GRevents, width=n)) then I get the genomic sequence not transcript sequence, i.e. not accounting for introns or mRNA border. I have tried solving it with exonsBy("transcriptDb object", "tx") function but without success.
>
> Question: Is there a bioconductor-supported way of getting resolving the problem? With CLIPseq being more and more popular this will be very demanded function.
>
> Thanks,
> Lukasz
>
> -- output of sessionInfo():
>
>> sessionInfo()
> R version 2.15.0 (2012-03-30)
> Platform: x86_64-unknown-linux-gnu (64-bit)
>
> locale:
> [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
> [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
> [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
> [7] LC_PAPER=C                 LC_NAME=C
> [9] LC_ADDRESS=C               LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
>
> other attached packages:
> [1] rtracklayer_1.16.0                 GenomicFeatures_1.8.1
> [3] AnnotationDbi_1.18.4               Biobase_2.16.0
> [5] BSgenome.Mmusculus.UCSC.mm9_1.3.17 BSgenome_1.24.0
> [7] Biostrings_2.24.1                  GenomicRanges_1.8.3
> [9] IRanges_1.14.2                     BiocGenerics_0.2.0
>
> loaded via a namespace (and not attached):
> [1] biomaRt_2.12.0  bitops_1.0-4.1  DBI_0.2-5       RCurl_1.91-1
> [5] Rsamtools_1.8.0 RSQLite_0.11.1  stats4_2.15.0   tools_2.15.0
> [9] XML_3.9-4       zlibbioc_1.2.0
>
>
> --
> Sent via the guest posting facility at bioconductor.org.
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
> 	[[alternative HTML version deleted]]
>
>
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>

-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fhcrc.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319



More information about the Bioconductor mailing list