[R] Counting defined character within String
Marc Schwartz
marc_schwartz at me.com
Mon Jul 5 16:18:19 CEST 2010
On Jul 5, 2010, at 9:04 AM, Kunzler, Andreas wrote:
> Dear list,
>
> I'm looking for a way to count the number of "|" within an object.
> The character "|" is used to separated ids.
>
> Assume a data (d) structure like
>
> Var
> NA
> NA
> NA
> NA
> NA
> 1
> 1|2
> 1|22|45
> 3
> 4b|24789
>
> I need to know the maximum number of ids within one object. In this case 3 (1|22|45)
>
>
> Does anybody know a better way?
>
> Thanks
Presuming that your column is in a data frame called 'DF', where the 'Var' column is likely imported as a factor:
> DF
Var
1 <NA>
2 <NA>
3 <NA>
4 <NA>
5 <NA>
6 1
7 1|2
8 1|22|45
9 3
10 4b|24789
> max(sapply(strsplit(as.character(DF$Var), split = "\\|"), length))
[1] 3
The above uses strsplit() to split each line using the "|" as the split character. Since "|" has a special meaning for regular expressions, it needs to be escaped using the double backslash:
> strsplit(as.character(DF$Var), split = "\\|")
[[1]]
[1] NA
[[2]]
[1] NA
[[3]]
[1] NA
[[4]]
[1] NA
[[5]]
[1] NA
[[6]]
[1] "1"
[[7]]
[1] "1" "2"
[[8]]
[1] "1" "22" "45"
[[9]]
[1] "3"
[[10]]
[1] "4b" "24789"
Then you just loop through each line getting the length:
> sapply(strsplit(as.character(DF$Var), split = "\\|"), length)
[1] 1 1 1 1 1 1 2 3 1 2
and of course get the max value.
HTH,
Marc Schwartz
More information about the R-help
mailing list