[R] use sliding window to count substrings found in large string

Gabor Grothendieck ggrothendieck at gmail.com
Wed Jul 7 18:50:31 CEST 2010

On Wed, Jul 7, 2010 at 12:25 PM, Immanuel <mane.desk at googlemail.com> wrote:
> Hello together,
> I'm looking for advice on how to do some tests on strings.
> What I want to do is the following:
> (just an example, real strings/sequence are about 200-400 characters long)
> given set of Strings:
> String1 abcdefgh
> String2 bcdefgop
> use a sliding window of size x  to create an vector of all subsequences
> of size x
> found in the set (order matters! ).
> Now create, for every string in the set, an vector containing the counts
> on how often
> each subsequence was found in this particular string.
>  It would be great if someone could give me a vague outline on how to
> start and which methods to work.
> I did read through the man pages and goggled a lot, but still don't know
> how to
> approach this.

Try this:

# generate an input string n long
n <- 300
lets <- paste(sample(letters[1:5], n, replace = TRUE), collapse = "")

# get rolling k-length sequences and count
k <- 3
table(substring(lets, 1:(n-k+1), k:n))

More information about the R-help mailing list