[R] Working with string
Marc Schwartz
marc_schwartz at me.com
Thu Jul 7 19:26:10 CEST 2011
Happy to help.
Your interpretation is correct on the use of "\\1". This returns the value contained in the first back reference in the regex. If you wanted to return multiple back references, these would be "\\2", "\\3" and so on, each referring to successive paren pairs in the regex. Note the double backslash here because of R's treatment of the '\' character, as you may be familiar with. In most regex references, you will see '\1'.
For a basic introduction, you can look at ?regex in R to gain some insights into the construction of regular expressions. There are online references such as http://www.regular-expressions.info/ and there is also a good O'Reilly book "Mastering Regular Expressions" (http://www.amazon.com/Mastering-Regular-Expressions-Jeffrey-Friedl/dp/0596528124).
HTH,
Marc
On Jul 7, 2011, at 12:33 PM, Bogaso Christofer wrote:
> Thanks Marc for your reply and detailed explanation. As you said, I also
> agree that, using stringr package I wont get anything really important,
> however I already have created a long code-book and now I do not want to
> change anything. However function names are here better meaningful.
>
> I have one more query here. Does "\\1" mean that, I want to report the
> selected string (in place of replacing with something?) What are the other
> related things? Can you help me giving some online reference?
>
> Thanks,
>
> -----Original Message-----
> From: Marc Schwartz [mailto:marc_schwartz at me.com]
> Sent: 07 July 2011 21:54
> To: Bogaso Christofer
> Cc: r-help at r-project.org
> Subject: Re: [R] Working with string
>
> On Jul 7, 2011, at 11:21 AM, Bogaso Christofer wrote:
>
>> Hi there, I have to extract some relevant portion from a defined
>> string, which is a mix of numeric and character. However this has
>> following
>> sequence:
>>
>>
>>
>> Some String - Some numerical - "c/C" (or "p/P") - then again some set
>> of numbers.
>>
>>
>>
>> Examples of such string is "fdahsdfcha163517253c463278643" or
>> "fdahsdfcha163517253C463278643" or "fdahsdfcha163517253P463278643",
>> "fdahsdfcha163517253p463278643" etc.
>>
>>
>>
>> I have tried using latest stringr package to accomplice that. Here is
>> my
>> try:
>>
>>
>>
>>> library(stringr)
>>
>>> str_extract("fdahsdfcha163517253c463278643", "[c]")
>>
>> [1] "c"
>>
>>
>>
>> But it seems that, above code fetching "c" from "fdahsdfcha" only. My
>> goal is to understand what is there between above 2 set of numbers,
> "C/c/P/p"?
>> Can somebody help me how to do that? I would like to use stringr
>> syntax because, I am already using lot of other functions from that.
>> Therefore if I can do it using that package then it would be good in terms
> of consistency.
>>
>>
>>
>> Thanks for your help.
>
>
> I don't use 'stringr', but you can get the desired result using ?gsub:
>
> x <- c("fdahsdfcha163517253c463278643", "fdahsdfcha163517253C463278643",
> "fdahsdfcha163517253P463278643", "fdahsdfcha163517253p463278643")
>
>
>> gsub(".+[0-9]+([cCpP])[0-9]+", "\\1", x)
> [1] "c" "C" "P" "p"
>
>
> The regex in the first argument tells gsub to find a sequence of any
> characters, followed by a sequence of numbers, followed a by single 'c',
> 'C', 'p' or 'P', finally followed by a sequence of numbers.
>
> Surrounding the [cCpP] in parens allows us to use a 'back reference' and
> return what is found within the parens using the "\\1" in the second
> argument.
>
>> From a brief review of the stringr manual, it looks like str_extract()
> supports the use of a regex for the pattern argument, but does not support
> the use of back references. It looks like str_replace_all() is a wrapper to
> gsub(), so you may want to look at that function and the examples for it.
> Thus, the syntax might be something like:
>
> str_replace_all(x, ".+[0-9]+([cCpP])[0-9]+", "\\1")
>
> and therefore, I am not sure what you are really saving by using it versus
> gsub() directly.
>
> HTH,
>
> Marc Schwartz
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list