[BioC] countMatches() (was: table for GenomicRanges)
Hervé Pagès
hpages at fhcrc.org
Fri Jan 4 22:11:08 CET 2013
Hi,
I added findMatches() and countMatches() to the latest IRanges /
GenomicRanges packages (in BioC devel only).
findMatches(x, table): An enhanced version of ‘match’ that
returns all the matches in a Hits object.
countMatches(x, table): Returns an integer vector of the length
of ‘x’, containing the number of matches in ‘table’ for
each element in ‘x’.
countMatches() is what you can use to tally/count/tabulate (choose your
preferred term) the unique elements in a GRanges object:
library(GenomicRanges)
set.seed(33)
gr <- GRanges("chr1", IRanges(sample(15,20,replace=TRUE), width=5))
Then:
> gr_levels <- sort(unique(gr))
> countMatches(gr_levels, gr)
[1] 1 1 1 2 4 2 2 1 2 2 2
Note that findMatches() and countMatches() also work on IRanges and
DNAStringSet objects, as well as on ordinary atomic vectors:
library(hgu95av2probe)
library(Biostrings)
probes <- DNAStringSet(hgu95av2probe)
unique_probes <- unique(probes)
count <- countMatches(unique_probes, probes)
max(count) # 7
I made other changes in IRanges/GenomicRanges so that the notion
of "match" between elements of a vector-like object now consistently
means "equality" instead of "overlap", even for range-based objects
like IRanges or GRanges objects. This notion of "equality" is the
same that is used by ==. The most visible consequence of those
changes is that using %in% between 2 IRanges or GRanges objects
'query' and 'subject' in order to do overlaps was replaced by
overlapsAny(query, subject).
overlapsAny(query, subject): Finds the ranges in ‘query’ that
overlap any of the ranges in ‘subject’.
There are warnings and deprecation messages in place to help smooth
the transition.
Cheers,
H.
--
Hervé Pagès
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024
E-mail: hpages at fhcrc.org
Phone: (206) 667-5791
Fax: (206) 667-1319
More information about the Bioconductor
mailing list