[R] Making a table: collapsing across sub-strings

Dieter Vanderelst dieter_vanderelst at emailengine.org
Thu Oct 4 09:46:54 CEST 2007


A sub string can occur anywhere in the main string.

I think I could use TABLE and than add the numbers. But I don't know how 
to access the numbers in the result of table.

Another problem is that there might be a hierarchy in the strings. This 
is, string a might be a subset of b while b might be a subset of c. So, 
when checking the strings, I would have to start with the longest string 
and find all subsets of that one. An than I should check the second 
longest string and so on...

But I cannot find a way of ordering strings on their length.


jim holtman wrote:
> How do you determine if one string is a subset of another?  Does it
> only match at the beginning, or anywhere?  How large is your set of
> strings?  Can you use table as you describe and then determine what
> the groupings of subsets are and then just add the numbers together?
> You can use grep/regexpr to determine if one string is a subset of
> another.
> On 10/3/07, Dieter Vanderelst <dieter_vanderelst at emailengine.org> wrote:
>> Hi list,
>> I'm currently processing textual data and I would really appreciate some
>> help with one off my problems.
>> I have a set of strings and I want to count how often each of this
>> strings appears in this set.
>> This is not very difficult and can be done as:
>> TB<-table(my_set)
>> plot(TB)
>> However, I also want to collapse across sub-strings. This is, I want a
>> sub-string ss of string S to be counted as an occurrence of string S.
>> So, 'abab' should be included in the count of 'ababaaa' and should not
>> be listed as a separate entry in the frequency table.
>> Does somebody has a pointer to a way to do this? I have been checking
>> out the CRAN packages for handling DNA sequences, but this has not
>> really brought me closer to a solution.
>> Thanks,
>> Dieter Vanderelst
>> ------------------------------------------
>> Dieter Vanderelst
>> Eindhoven University of Technology
>> Faculty of Industrial Design
>> Designed Intelligence Group
>> Den Dolech 2
>> 5612 AZ Eindhoven
>> The Netherlands
>> Tel +31 40 247 91 11
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.

More information about the R-help mailing list