[R] gsub() with unicode and escape character

Nipesh Bajaj bajaj141003 at gmail.com
Sun Jul 17 15:18:41 CEST 2011


I really sorry if I understood your statement correctly :(

You said:
" To put a backslash in the replacement expression of sub or gsub
(when fixed=FALSE) use 4 backslashes"

I understood it is okay if I want to replace something with 2
backslashes. what if I want to replace that with just 1 backslash? I
have tried following however didn't work (R is asking few more input):

gsub("d","\\\",my.data$animals)

You said:
"replacement expression backslash-digit means to use the digit'th
parenthesized subpattern as the replacement"

Would you please elaborate this phenomena?  If I use "backslash-digit
= 6" then I dont see any difference in the end result:
> gsub("d","\\\\\\",my.data$animals)
[1] "\\og" "wolf" "cat"

Really helpful if you elaborate more on these issues.

Thanks,

On Sun, Jul 17, 2011 at 8:34 AM, William Dunlap <wdunlap at tibco.com> wrote:
> To put a backslash in the replacement expression
> of sub or gsub (when fixed=FALSE) use 4 backslashes.
> The rationale is that the replacement expression
> backslash-digit means to use the digit'th parenthesized
> subpattern as the replacement and backslash-backslash means
> to put in a literal backslash.  However, R parser also uses
> backslashes to signify things like unicode characters (that
> backslash is not in the string stored by R, but is just a
> signal to the parser) and it requires a doubled backslash
> to enter a backslash.  2*2 is 4 backslashes.  E.g.,
>
>  > gsub("([[:digit:]]+)([[:alpha:]]+)", "alpha=<<\\2>>\\\\numeric=<<\\1>>", c("12P", "34Cat"))
>  [1] "alpha=<<P>>\\numeric=<<12>>"   "alpha=<<Cat>>\\numeric=<<34>>"
>  > cat(.Last.value, sep="\n") # see what is really in the strings
>  alpha=<<P>>\numeric=<<12>>
>  alpha=<<Cat>>\numeric=<<34>>
>
> I don't know about your unicode/encoding problem.
>
> Bill Dunlap
> Spotfire, TIBCO Software
> wdunlap tibco.com
>
>> -----Original Message-----
>> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of Sverre Stausland
>> Sent: Saturday, July 16, 2011 7:20 PM
>> To: r-help at r-project.org
>> Subject: [R] gsub() with unicode and escape character
>>
>> Dear helpers,
>>
>> I'm trying to replace a character with a unicode code inside a data
>> frame using gsub(), but unsuccessfully.
>>
>> > data.frame(animals=c("dog","wolf","cat"))->my.data
>> > gsub("o","\u0254",my.data$animals)->my.data$animals
>> > my.data$animals
>> [1] "dÉ”g"  "wÉ”lf" "cat"
>>
>> It's not that a data frame cannot have unicode codes, cf. e.g.
>>
>> > data.frame(animals=c("d\u0254g","w\u0254lf","cat"))->my.data.2
>> > my.data.2$animals
>> [1] dɔg  wɔlf cat
>> Levels: cat d<U+0254>g w<U+0254>lf
>>
>> I've done the best I can based on what ?gsub and ?enc2utf8 tell me,
>> but I haven't found a solution.
>>
>> Unrelated to that problem, but related to gsub() is that I can't find
>> a way for gsub() to interpret the backslash as a character. In regular
>> expression, \\ should represent "the character \", but gsub() doesn't:
>>
>> > data.frame(animals=c("dog","wolf","cat"))->my.data
>> > gsub("d","\\",my.data$animals)
>> [1] "og"   "wolf" "cat"
>>
>> Thank you
>> Sverre
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list