[R] Counting the occurences of a charater within a string

David Winsemius dwinsemius at comcast.net
Fri Dec 2 05:38:04 CET 2011


On Dec 1, 2011, at 11:11 PM, Bert Gunter wrote:

> strsplit is certainly an alternative, but your approach is
> unnecessarily complicated and inefficient. Do this, instead:
>
> sapply(strsplit(x,"/"),length)-1

Definitely more compact that the regex alternates I came up with, but  
one of these still might appeal in situations where it was desireable  
to have the source strings as labels:

 > sapply( sapply(x$Col1, gregexpr, patt="/"), length)
     abc/def ghi/jkl/mno
           1           2

 > nchar( sapply(x$Col1, gsub, patt="[^/]", rep="" ) )
     abc/def ghi/jkl/mno
           1           2

-- 
David

>
> Cheers,
> Bert
>
> On Thu, Dec 1, 2011 at 7:44 PM, Florent D. <flodel at gmail.com> wrote:
>> Resending my code, not sure why the linebreaks got eaten:
>>
>>> x <- data.frame(Col1 = c("abc/def", "ghi/jkl/mno"),  
>>> stringsAsFactors = FALSE)
>>> count.slashes <- function(string)sum(unlist(strsplit(string,  
>>> NULL)) == "/")
>>> within(x, Col2 <- vapply(Col1, count.slashes, 1))
>>         Col1 Col2
>> 1     abc/def    1
>> 2 ghi/jkl/mno    2
>>
>>
>> On Thu, Dec 1, 2011 at 10:32 PM, Florent D. <flodel at gmail.com> wrote:
>>> I used within and vapply:
>>>
>>> x <- data.frame(Col1 = c("abc/def", "ghi/jkl/mno"),  
>>> stringsAsFactors = FALSE)
>>> count.slashes <- function(string)sum(unlist(strsplit(string,  
>>> NULL)) ==
>>> "/")within(x, Col2 <- vapply(Col1, count.slashes, 1))
>>>          Col1 Col21     abc/def    12 ghi/jkl/mno    2
>>>
>>> On Thu, Dec 1, 2011 at 1:05 PM, Bert Gunter  
>>> <gunter.berton at gene.com> wrote:
>>>> ## It's not a data frame -- it's just a vector.
>>>>
>>>>> x
>>>> [1] "abc/def"     "ghi/jkl/mno"
>>>>> gsub("[^/]","",x)
>>>> [1] "/"  "//"
>>>>> nchar(gsub("[^/]","",x))
>>>> [1] 1 2
>>>>>
>>>>
>>>> ?gsub
>>>> ?nchar
>>>>
>>>> -- Bert
>>>>
>>>> On Thu, Dec 1, 2011 at 8:32 AM, Douglas Esneault
>>>> <Douglas.Esneault at mecglobal.com> wrote:
>>>>> I am new to R but am experienced SAS user and I was hoping to  
>>>>> get some help on counting the occurrences of a character within  
>>>>> a string at a row level.
>>>>>
>>>>> My dataframe, x,  is structured as below:
>>>>>
>>>>> Col1
>>>>> abc/def
>>>>> ghi/jkl/mno
>>>>>
>>>>> I found this code on the board but it counts all occurrences of  
>>>>> "/" in the dataframe.
>>>>>
>>>>> chr.pos <- which(unlist(strsplit(x,NULL))=='/')
>>>>> chr.count <- length(chr.pos)
>>>>> chr.count
>>>>> [1] 3
>>>>>
>>>>> I'd like to append a column, say cnt, that has the count of "/"  
>>>>> for each row.
>>>>>
>>>>> Can anyone point me in the right direction or offer some code to  
>>>>> do this?
>>>>>
>>>>> Thanks in advance for the help.
>>>>>
>>>>> Doug Esneault
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Privileged/Confidential Information may be contained in this  
>>>>> message. If you
>>>>> are not the addressee indicated in this message (or responsible  
>>>>> for delivery
>>>>> of the message to such person), you may not copy or deliver this  
>>>>> message to
>>>>> anyone. In such case, you should destroy this message and kindly  
>>>>> notify the
>>>>> sender by reply email. Please advise immediately if you or your  
>>>>> employer
>>>>> does not consent to email for messages of this kind. Opinions,  
>>>>> conclusions
>>>>> and other information in this message that do not relate to the  
>>>>> official
>>>>> business of the GroupM companies shall be understood as neither  
>>>>> given nor
>>>>> endorsed by it.   GroupM companies are a member of WPP plc. For  
>>>>> more
>>>>> information on our business ethical standards and Corporate  
>>>>> Responsibility
>>>>> policies please refer to our website at
>>>>> http://www.wpp.com/WPP/About/
>>>>>
>>>>>
>>>>> ______________________________________________
>>>>> R-help at r-project.org mailing list
>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> Bert Gunter
>>>> Genentech Nonclinical Biostatistics
>>>>
>>>> Internal Contact Info:
>>>> Phone: 467-7374
>>>> Website:
>>>> http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm
>>>>
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>
>
>
> -- 
>
> Bert Gunter
> Genentech Nonclinical Biostatistics
>
> Internal Contact Info:
> Phone: 467-7374
> Website:
> http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
West Hartford, CT



More information about the R-help mailing list