[BioC] how to merge GRanges objects
Hervé Pagès
hpages at fhcrc.org
Wed Oct 16 19:54:18 CEST 2013
Hi John,
On 10/16/2013 07:15 AM, John linux-user wrote:
> Hello everyone,
>
> I am wondering how to simply merge two GRanges objects by range field and add the value by additional vector. For example, I have two objects below
>
> obj1
>
> seqnames ranges strand | Val
> <Rle> <IRanges> <Rle> | <integer>
> [1] chr1_random [272531, 272571] + | 88
> [2] chr1_random [272871, 272911] + | 45
>
> obj2
> seqnames ranges strand | Val
> <Rle> <IRanges> <Rle> | <integer>
> [1] chr1_random [272531, 272581] + | 800
> [2] chr1_random [272850, 272911] + | 450
>
> after merged, it should be an object as the following mergedObject and it would concern the differences in IRANGE data (e.g. 581 and 850 in obj2 above were different from those of obj1, which were 571 and 871 respectively)
>
> mergedObject
>
> seqnames ranges strand | object2Val object1Val
> <Rle> <IRanges> <Rle> | <integer> <integer>
> [1] chr1_random [272531, 272581] + | 800 88
> [2] chr1_random [272850, 272911] + | 450 45
>
I fail to see how this result makes sense. If you think of the "val"
metadata column as a numerical variable defined along the genome, then,
in the merged object, it takes the value 88 over the [272531, 272581]
interval but in original object 'obj1' it was taking this value
over the [272531, 272571] interval.
The following merged object would make much more sense to me:
seqnames ranges strand | object2Val object1Val
<Rle> <IRanges> <Rle> | <integer> <integer>
[1] chr1_random [272531, 272571] + | 800 88
[1] chr1_random [272572, 272581] + | 800 NA
[2] chr1_random [272850, 272870] + | 450 NA
[2] chr1_random [272871, 272911] + | 450 45
Sounds like you need all the ranges in disjoin(union(obj1, obj2))
if you want to be able to represent the 2 numerical variables
accurately. But maybe I'm missing completely what you're trying
to achieve.
H.
>
>
>
> On Tuesday, October 15, 2013 12:36 PM, Lukasz [guest] <guest at bioconductor.org> wrote:
>
>
> Hi!
>
> Problem summary: How to retrieve part of the sequence of mRNA around given location.
>
> I have the locations of the binding to mRNA events as GRanges (GRevents) and need to retrieve sequence for motif finding. The problem is that if I use getSeq(flank(GRevents, width=n)) then I get the genomic sequence not transcript sequence, i.e. not accounting for introns or mRNA border. I have tried solving it with exonsBy("transcriptDb object", "tx") function but without success.
>
> Question: Is there a bioconductor-supported way of getting resolving the problem? With CLIPseq being more and more popular this will be very demanded function.
>
> Thanks,
> Lukasz
>
> -- output of sessionInfo():
>
>> sessionInfo()
> R version 2.15.0 (2012-03-30)
> Platform: x86_64-unknown-linux-gnu (64-bit)
>
> locale:
> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
> [7] LC_PAPER=C LC_NAME=C
> [9] LC_ADDRESS=C LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats graphics grDevices utils datasets methods base
>
> other attached packages:
> [1] rtracklayer_1.16.0 GenomicFeatures_1.8.1
> [3] AnnotationDbi_1.18.4 Biobase_2.16.0
> [5] BSgenome.Mmusculus.UCSC.mm9_1.3.17 BSgenome_1.24.0
> [7] Biostrings_2.24.1 GenomicRanges_1.8.3
> [9] IRanges_1.14.2 BiocGenerics_0.2.0
>
> loaded via a namespace (and not attached):
> [1] biomaRt_2.12.0 bitops_1.0-4.1 DBI_0.2-5 RCurl_1.91-1
> [5] Rsamtools_1.8.0 RSQLite_0.11.1 stats4_2.15.0 tools_2.15.0
> [9] XML_3.9-4 zlibbioc_1.2.0
>
>
> --
> Sent via the guest posting facility at bioconductor.org.
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
> [[alternative HTML version deleted]]
>
>
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>
--
Hervé Pagès
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024
E-mail: hpages at fhcrc.org
Phone: (206) 667-5791
Fax: (206) 667-1319
More information about the Bioconductor
mailing list