[R] matching subvectors in vector sets
jim holtman
jholtman at gmail.com
Fri Apr 17 18:51:36 CEST 2009
How about this:
> x <- "A00096:A00096:A00096:A00096:A02178:A02178:A07776"
> x.s <- unlist(strsplit(x, ":"))
> for (i in 2:length(x.s)){
+ x.seq <- embed(length(x.s):1, i)
+ print(table(apply(x.seq, 1, function(z){
+ paste(x.s[z], collapse=":")
+ })))
+ }
A00096:A00096 A00096:A02178 A02178:A02178 A02178:A07776
3 1 1 1
A00096:A00096:A00096 A00096:A00096:A02178 A00096:A02178:A02178
A02178:A02178:A07776
2 1 1
1
A00096:A00096:A00096:A00096 A00096:A00096:A00096:A02178
A00096:A00096:A02178:A02178 A00096:A02178:A02178:A07776
1 1
1 1
A00096:A00096:A00096:A00096:A02178 A00096:A00096:A00096:A02178:A02178
A00096:A00096:A02178:A02178:A07776
1 1
1
A00096:A00096:A00096:A00096:A02178:A02178
A00096:A00096:A00096:A02178:A02178:A07776
1
1
A00096:A00096:A00096:A00096:A02178:A02178:A07776
1
On Fri, Apr 17, 2009 at 9:33 AM, Albert Vilella <avilella at gmail.com> wrote:
> Starting by the first entry:
> A00096:A00096:A00096:A00096:A02178:A02178:A07776
>
> and supposing there aren't any other subvectors identical in the set, the
> algorithm will slide through the vector, first in pairs, then in trios, then
> in sets of four, etc, and count the occurrences:
>
> A00096:A00096
> 3
> A00096:A02178
> 1
> A02178:A02178
> 1
> A02178:A07776
> 1
> A00096:A00096:A00096
> 2
> A00096:A00096:A02178
> 1
> A00096:A02178:A02178
> 1
> A02178:A02178:A07776
> 1
> A00096:A00096:A00096:A00096
> 1
> A00096:A00096:A00096:A02178
> 1
> A00096:A00096:A02178:A02178
> 1
> A00096:A02178:A02178:A07776
> 1
> A00096:A00096:A00096:A00096:A02178
> 1
> A00096:A00096:A00096:A02178:A02178
> 1
> A00096:A00096:A02178:A02178:A07776
> 1
> A00096:A00096:A00096:A00096:A02178:A02178
> 1
> A00096:A00096:A00096:A02178:A02178:A07776
> 1
> A00096:A00096:A00096:A00096:A02178:A02178:A07776
> 1
>
>
>
>
> On Fri, Apr 17, 2009 at 1:04 PM, jim holtman <jholtman at gmail.com> wrote:
>>
>> Can you provide the output that you would expect from the data you
>> gave. I am not sure what you mean by a 'subvector'.
>>
>> On Fri, Apr 17, 2009 at 5:25 AM, Albert Vilella <avilella at gmail.com>
>> wrote:
>> > Hi,
>> >
>> > I've got a list of ~20000 elements that look like this:
>> >
>> > [1]
>> > "A00096:A00096:A00096:A00096:A02178:A02178:A07776"
>> >
>> > [2]
>> > "A00046:A00076:A01101:A04146:A05671:A07169"
>> >
>> > [3]
>> >
>> > "A00038:A00932:A02185:A02370:A02818:A02818:A02818:A02818:A04732:A07142:A07142"
>> >
>> > [4]
>> > "A00096:A01352:A01352:A02023:A05001:A05001:A07776"
>> >
>> > [5]
>> >
>> > "A00036:A00047:A00059:A00503:A00904:A00904:A00904:A01023:A01023:A01399:A02029:A03941:A07679"
>> > [6]
>> >
>> > "A00041:A00533:A00855:A02178:A02178:A02178:A05671:A05671:A05671:A05671:A05671:A05671:A05671"
>> > ...
>> >
>> > And I would like to have a table with the frequency of occurrences for
>> > matching subvectors in all elements, i.e., not
>> > only the number of times a vector is found but also how many times a
>> > subvector (of at least 2 ids) is found.
>> >
>> > How can I do that?
>> > Thanks in advance,
>> > Albert.
>> >
>> > [[alternative HTML version deleted]]
>> >
>> > ______________________________________________
>> > R-help at r-project.org mailing list
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide
>> > http://www.R-project.org/posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>> >
>>
>>
>>
>> --
>> Jim Holtman
>> Cincinnati, OH
>> +1 513 646 9390
>>
>> What is the problem that you are trying to solve?
>
>
--
Jim Holtman
Cincinnati, OH
+1 513 646 9390
What is the problem that you are trying to solve?
More information about the R-help
mailing list