[R] Confusing behavior when using gsub to insert unicode character (minimal working example provided)
David Winsemius
dwinsemius at comcast.net
Thu May 29 05:39:27 CEST 2014
On May 28, 2014, at 7:25 PM, Thomas Stewart wrote:
> Can anyone help me understand the following behavior?
>
> I want to replace the letter 'X' in
> the string
> 'text X' with '≥' (\u226
> 5
> ). The output from gsub is not what I expect. It gives: "text ≥".
>
> Now, suppose I want to replace the character '≤' in
> the string
> 'text ≤' with '≥'. Then, gsub gives the expected, desired output.
>
> What am I missing?
>
> Thanks for any insight.
> -tgs
>
> Minimal Working Example:
>
> string1 <- "text X"; string1
> new_string1 <- gsub("X","\u2265",string1); new_string1
Try this instead:
> new_string1 <- gsub("X","\\\u2265",string1); new_string1
[1] "text ≥"
Each "\" needs to be escaped, both the "\" in \u2265 as well as the "\" that escapes it.
> nchar("\\")
[1] 1
> nchar("\\\u2265")
[1] 2
You would be well-served by spending effort at reading:
?Quotes
--
David.
>
> string2 <- "text \u2264"; string2
> new_string2 <- gsub("\u2264","\u2265",string2); new_string2
>
> charToRaw(new_string1)
> charToRaw(new_string2)
>
> sessionInfo()
>
> ## OUTPUT
>
>> string1 <- "text X"; string1
> [1] "text X"
>
>> new_string1 <- gsub("X","\u2265",string1); new_string1
> [1] "text ≥"
>
>> string2 <- "text \u2264"; string2
> [1] "text ≤"
>
>> new_string2 <- gsub("\u2264","\u2265",string2); new_string2
> [1] "text ≥"
>
>> charToRaw(new_string1)
> [1] 74 65 78 74 20 e2 89 a5
> charToRaw("\\\u2265")
[1] 5c e2 89 a5
>
>> charToRaw(new_string2)
> [1] 74 65 78 74 20 e2 89 a5
>
>> sessionInfo()
> R version 3.0.2 (2013-09-25)
> Platform: x86_64-w64-mingw32/x64 (64-bit)
>
It was a good idea to post sessionInfo(), but it would have been even better to have posted in plain text.
> [[alternative HTML version deleted]]
>
--
David Winsemius
Alameda, CA, USA
More information about the R-help
mailing list