[BioC] GRanges list and reduce function
Martin Morgan
mtmorgan at fhcrc.org
Mon Aug 25 19:13:57 CEST 2014
On 08/25/2014 05:31 AM, Asma rabe wrote:
> Hi Vincent, Martin,
>
> Thank you very much for your kind explanation.
>
> For Martin:
>
>>For exons group by _gene_, it's possible that genes are annotated to contain exons from different chromosomes
>
> How genes can be annotated to contain exons from different many chromosomes?
I don't know, but they are! You can see the reason for some of these; there are
more interesting examples.
> exByGn[elementLengths(unique(seqnames(exByGn))) > 1]
GRangesList of length 277:
$100126314
GRanges with 7 ranges and 2 metadata columns:
seqnames ranges strand | exon_id exon_name
<Rle> <IRanges> <Rle> | <integer> <character>
[1] chr6 [30552109, 30552194] + | 87067 <NA>
[2] chr6_cox_hap2 [ 2064162, 2064247] + | 278963 <NA>
[3] chr6_dbb_hap3 [ 1845750, 1845835] + | 280931 <NA>
[4] chr6_mann_hap4 [ 1900181, 1900266] + | 282770 <NA>
[5] chr6_mcf_hap5 [ 1933963, 1934048] + | 284213 <NA>
[6] chr6_qbl_hap6 [ 1845017, 1845102] + | 286075 <NA>
[7] chr6_ssto_hap7 [ 1884391, 1884476] + | 287961 <NA>
$100128977
GRanges with 4 ranges and 2 metadata columns:
seqnames ranges strand | exon_id exon_name
[1] chr17 [43920722, 43921527] - | 227980 <NA>
[2] chr17 [43972846, 43972879] - | 227981 <NA>
[3] chr17_ctg5_hap1 [ 894694, 894727] + | 289539 <NA>
[4] chr17_ctg5_hap1 [ 946013, 946818] + | 289540 <NA>
...
<275 more elements>
---
seqlengths:
chr1 chr2 ... chrUn_gl000249
249250621 243199373 ... 38502
>
>
>
> Best Regards,
> Asma
>
>
> On Fri, Aug 15, 2014 at 11:56 PM, Martin Morgan <mtmorgan at fhcrc.org
> <mailto:mtmorgan at fhcrc.org>> wrote:
>
> On 08/15/2014 03:20 AM, Asma rabe wrote:
>
> Hi ,
>
>
> I need a Granges object with exons data for few chromosomes, i got Granges
> list of transcripts and their exons as follows:
>
>
> library("TxDb.Hsapiens.UCSC.__hg19.knownGene")
>
> txdb<-TxDb.Hsapiens.UCSC.hg19.__knownGene
>
> tx_Exons<-exonsBy(txdb)
>
>
>
> 1-How to use reduce on Granges list?how to get the unique exons only and
> exclude redundant exons?
>
>
> I'm not sure what this means -- you've asked for exons grouped by
> transcript, and there are not 'extra' exons in each transcript. Did you want
> exonsBy(txdb, "gene") ?
>
> reduce(tx_Exons) reduces within each transcript (list element); I'm not sure
> what you'd really like to do?
>
>
>
> 2-How to select exons of certain chromosomes only ex: chr10? i tried the
> following but i wonder why i got GRnages list with empty Grange lists??
>
>
> if you want to select transcripts where all exons are in certain
> chromosomes, note that
>
> seqnames(tx_Exonss) %in% "chr10"
>
> returns an RleList, and
>
> all(seqnames(tx_Exons) %in% "chr10")
>
> asks element-wise whether all elements of each Rle are TRUE, returning a
> logical vector of the same length as tx_Exons. So
>
> tx_Exons[all(seqnames(tx___Exons) %in% "chr10")]
>
> returns the transcripts with all exons on chr10. For exons group by _gene_,
> it's possible that genes are annotated to contain exons from different
> chromosomes
>
> exByGn = exonsBy(txdb, "gene")
> table(elementLengths(__runLength(seqnames(exByGn))))
>
>
> 1 2 3 4 5 6 7 8
> 23182 77 4 3 19 38 76 60
>
> and only exons in chr10, preserving grouping by gene and removing genes
> without any exons in chr10, are
>
> chr10 <- exByGn[seqnames(exByGn) %in% "chr10"]
>
>
> this is what you did below. The result is not empty, just contains the many
> transcripts with exons not in chr10 removed, plus those deep in the list
> that are on chr10. Here I remove the elements without 0 elements.
>
> chr10[elementLengths(chr10) != 0]
>
>
> Martin
>
>
>
> chr10<-tx_Exons[seqnames(tx___Exons)=="chr10",]
>
>
> chr10
>
>
> GRangesList of length 80922:
>
> $1
>
> GRanges with 0 ranges and 3 metadata columns:
>
> seqnames ranges strand | exon_id exon_name exon_rank
>
> <Rle> <IRanges> <Rle> | <integer> <character> <integer>
>
>
> $2
>
> GRanges with 0 ranges and 3 metadata columns:
>
> seqnames ranges strand | exon_id exon_name exon_rank
>
>
> $3
>
> GRanges with 0 ranges and 3 metadata columns:
>
> seqnames ranges strand | exon_id exon_name exon_rank
>
>
> ...
>
> <80919 more elements>
>
> ---
>
> seqlengths:
>
> chr1 chr2 ... chrUn_gl000249
>
> 249250621 243199373 ... 38502
>
>
>
> length(chr10)
>
>
> [1] 80922
>
> length(tx_Exons)
>
>
> [1] 80922
>
>
> Thank you
>
> [[alternative HTML version deleted]]
>
> _________________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org <mailto:Bioconductor at r-project.org>
> https://stat.ethz.ch/mailman/__listinfo/bioconductor
> <https://stat.ethz.ch/mailman/listinfo/bioconductor>
> Search the archives:
> http://news.gmane.org/gmane.__science.biology.informatics.__conductor
> <http://news.gmane.org/gmane.science.biology.informatics.conductor>
>
>
>
> --
> Computational Biology / Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N.
> PO Box 19024 Seattle, WA 98109
>
> Location: Arnold Building M1 B861
> Phone: (206) 667-2793 <tel:%28206%29%20667-2793>
>
>
--
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109
Location: Arnold Building M1 B861
Phone: (206) 667-2793
More information about the Bioconductor
mailing list