Dieter Vanderelst dieter_vanderelst at emailengine.org
Wed Oct 3 17:25:10 CEST 2007

Hi list,

I'm currently processing textual data and I would really appreciate some
help with one off my problems.

I have a set of strings and I want to count how often each of this
strings appears in this set.

This is not very difficult and can be done as:


However, I also want to collapse across sub-strings. This is, I want a
sub-string ss of string S to be counted as an occurrence of string S.

So, 'abab' should be included in the count of 'ababaaa' and should not
be listed as a separate entry in the frequency table.

Does somebody has a pointer to a way to do this? I have been checking
out the CRAN packages for handling DNA sequences, but this has not
really brought me closer to a solution.

