[R] Nested functions
Thomas Lumley
tlumley at u.washington.edu
Tue Jul 18 16:55:31 CEST 2006
On Mon, 17 Jul 2006, John Wiedenhoeft wrote:
> Hi there,
>
> I'm having myself a hard time writing an algorithm for finding patterns
> within a given melody. In a vector I'd like to find ALL sequences that
> occur at least twice, without having to check all possible patterns via
> pattern matching.
>
Another approach, which works for not-too-long vectors like you have is:
n <- length(v)
matches <- outer(v, v, "==") & outer(1:n,1:n,">=")
Now matches has TRUE where v[i]==v[j]. For a longer match you would also
need v[i+1]==v[j+1] and so on, making a diagonal line through the matrix.
Diagonal lines are hard, so let's turn them into horizontal lines
matches <- matrix(cbind(matches, FALSE), ncol=n)
now row i+1 column j of matches is TRUE for a single entry match starting
at position j at a separation of i. If there is a match of length 2, then
column j+1 will also be TRUE, and so on.
Now rle() applied to a row will return the lengths of consecutive
sequences of TRUE and FALSE. The lengths of consecutive sequences of TRUE
are the lengths of the matches. To get rid of trivial matches of length
less than 2 do
match2 <- t(apply(matches,1,function(row){
r<-rle(row)
r$values[r$lengths<2]<-FALSE
inverse.rle(r)
}))
And finally, to extract the matches
results <- apply(match2, 1, function(row){
r<-rle(row)
n<-length(r$lengths)
ends<-cumsum(r$lengths)
starts<-cumsum(c(1,r$lengths))[1:n]
list(starts[r$values],ends[r$values])
})
for starts and ends of matches or
results <- apply(match2, 1, function(row){
r<-rle(row)
n<-length(r$lengths)
ends<-cumsum(r$lengths)[r$values]
starts<-cumsum(c(1,r$lengths))[1:n][r$values]
mapply(function(stt,end) v[stt:end],starts,ends,
SIMPLIFY=FALSE)
})
to get a list of the actual matching sequences.
-thomas
More information about the R-help
mailing list