[R] Count matches of a sequence in a vector?
David Winsemius
dwinsemius at comcast.net
Wed Apr 21 23:14:31 CEST 2010
On Apr 21, 2010, at 11:07 AM, Jeff Brown wrote:
> At April 21, 2010 10:16:10 AM EDT mieke posted to Nabble:
>> Hey there,
>>
>> I need to count the matches of a sequence seq=c(2,3,4) in a long
>> vector
>> v=c(4,2,5,8,9,2,3,5,6,1,7,2,3,4,5,....).
>> With sum(v %in% seq) I only get the sum of sum(v %in% 2), sum(v %in
>> % 3) and
>> sum(v %in% 4), but that's not what I need :(
>>
>
> This sort of calculation can't be vectorized; you'll have to iterate
> through
> the sequence, e.g. with a "for" loop. I don't know if a routine has
> already
> been written.
A vectorized solution:
vseq <-c(2,3,4)
v <- c(4,2,5,8,9,2,3,5,6,1,7,2,3,4,5)
sum( v[1:(length(v) -2)] == vseq[1] &
v[2:(length(v) -1)] == vseq[2] &
v[3:(length(v) )] == vseq[3] )
# [1] 1
And a check on relative speed which was also a concern you expressed:
require(rbenchmark)
require(zoo)
logsum <- function(v,vseq) sum( v[1:(length(v) -2)] == vseq[1] &
v[2:(length(v) -1)] == vseq[2] &
v[3:(length(v) )] == vseq[3] )
lseq = length(vseq)
lv = length(v)
sumroll <- function(v,vseq) sum( rollapply(zoo(v), 3, function(x)
all(x == vseq)) )
summatches <- function(v,vseq) sum( sapply(1:(lv-lseq
+1),function(i)all(v[i:(i+lseq-1)] == vseq)) )
> benchmark(
+ logsum(v, vseq),
+ summatches(v,vseq),
+ sumroll(v,vseq),
+ order=c('replications', 'elapsed'))
test replications elapsed relative user.self
sys.self user.child sys.child
1 logsum(v, vseq) 100 0.002 1.0 0.003
0.001 0 0
2 summatches(v, vseq) 100 0.016 8.0 0.016
0.000 0 0
3 sumroll(v, vseq) 100 0.087 43.5 0.087
0.001 0 0
--
David Winsemius, MD
West Hartford, CT
More information about the R-help
mailing list