[BioC] multiple hits with countOverlaps function
Wei Shi
shi at wehi.EDU.AU
Thu Apr 14 01:43:41 CEST 2011
Hi,
I am using countOverlaps function in GenomicFeatures package to count how many sequencing reads are mapped to each genomic feature (e.g. promoter region, gene region etc.). However I found that if a read was mapped to more than one features, then the count of each feature will be increased by 1, i.e. this read was counted more than one times although it can only originate from one genomic location. Below is a simple example to show this:
-------------------------
features <- GRanges(seqnames="chr1", ranges=IRanges(c(100,500),c(1000,1500)) ,strand=c("+","+"))
features
GRanges with 2 ranges and 0 elementMetadata values
seqnames ranges strand |
<Rle> <IRanges> <Rle> |
[1] chr1 [100, 1000] + |
[2] chr1 [500, 1500] + |
seqlengths
chr1
NA
reads <- GRanges(seqnames="chr1",ranges=IRanges(c(150,600),c(250,700)),strand=c("+","+"))
GRanges with 2 ranges and 0 elementMetadata values
seqnames ranges strand |
<Rle> <IRanges> <Rle> |
[1] chr1 [150, 250] + |
[2] chr1 [600, 700] + |
seqlengths
chr1
NA
countOverlaps(features,reads)
[1] 2 1
--------------------
I think It will be a better to have an argument in countOverlaps function which specifies whether a single hit (eg. a random one) or multiple hits should be returned for each query sequence.
Below is my session info:
sessionInfo()
R version 2.12.0 (2010-10-15)
Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
locale:
[1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] GenomicFeatures_1.2.3 GenomicRanges_1.2.3 IRanges_1.8.9
loaded via a namespace (and not attached):
[1] Biobase_2.10.0 biomaRt_2.5.1 Biostrings_2.18.2 BSgenome_1.18.3 DBI_0.2-5 RCurl_1.4-3
[7] RSQLite_0.9-2 rtracklayer_1.10.6 tools_2.12.0 XML_3.2-0
Cheers,
Wei
______________________________________________________________________
The information in this email is confidential and intend...{{dropped:6}}
More information about the Bioconductor
mailing list