[R] searching several subsequences in a single string sequence
Barry Rowlingson
b.rowlingson at lancaster.ac.uk
Tue Sep 27 19:06:21 CEST 2011
On Tue, Sep 27, 2011 at 5:51 PM, Marcelo Araya <marceloa27 at gmail.com> wrote:
> Hi all
>
>
>
> I am analyzing bird song element sequences. I would like to know how can I
> get how many times a given subsequence is found in single string sequence.
>
>
>
>
>
> For example:
>
>
>
> If I have this single sequence:
>
>
>
> ABCABAABABABCAB
>
>
>
> I am looking for the subsequence "ABC". Want I need to get here is that the
> subsequence is found twice.
>
>
>
> Any idea how can I do this?
>
gregexpr will return the position and length of multiple matches. And
you can feed it a vector. So:
> songs=c("ABCABAABABABCAB","ABACAB","ABABCABCBC")
> gregexpr(m,songs)
[[1]]
[1] 1 11
attr(,"match.length")
[1] 3 3
[[2]]
[1] -1
attr(,"match.length")
[1] -1
[[3]]
[1] 3 6
attr(,"match.length")
[1] 3 3
- in the first item, it was found at posn 1 and 11
- in the second it wasnt found at all
- in the third, it was found at posn 3 and 6
so just do some apply-ing to the returned list and get the length of
each element. Job done!
Barry
PS bonus points for spotting the hidden prog-rock song title.
More information about the R-help
mailing list