[R] gsub() with unicode and escape character

Sverre Stausland johnsen at fas.harvard.edu
Sun Jul 17 20:00:04 CEST 2011


Sorry for not including those details. Here is a more detailed description:

> data.frame(animals=c("dog","wolf","cat"))->my.data
> gsub("o","\u0254",my.data$animals)->my.data$animals
> my.data$animals
[1] "dɔg"  "wɔlf" "cat"

> sessionInfo()
R version 2.13.1 (2011-07-08)
Platform: i386-pc-mingw32/i386 (32-bit)

locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.1252

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

> Sys.getlocale()
[1] "LC_COLLATE=English_United States.1252;LC_CTYPE=English_United
States.1252;LC_MONETARY=English_United
States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252"

Best
Sverre

On Sun, Jul 17, 2011 at 2:26 AM, Prof Brian Ripley
<ripley at stats.ox.ac.uk> wrote:
> You forgot the 'at a minimum' information required by the posting guide.
>
> Most likely this is a limitation of the locale you used (and failed to tell
> us about) on the OS you used (...).
>
> On Sat, 16 Jul 2011, Sverre Stausland wrote:
>
>> Dear helpers,
>>
>> I'm trying to replace a character with a unicode code inside a data
>> frame using gsub(), but unsuccessfully.
>>
>>> data.frame(animals=c("dog","wolf","cat"))->my.data
>>> gsub("o","\u0254",my.data$animals)->my.data$animals
>>> my.data$animals
>>
>> [1] "dÉ”g"  "wÉ”lf" "cat"
>>
>> It's not that a data frame cannot have unicode codes, cf. e.g.
>>
>>> data.frame(animals=c("d\u0254g","w\u0254lf","cat"))->my.data.2
>>> my.data.2$animals
>>
>> [1] dɔg  wɔlf cat
>> Levels: cat d<U+0254>g w<U+0254>lf
>>
>> I've done the best I can based on what ?gsub and ?enc2utf8 tell me,
>> but I haven't found a solution.
>>
>> Unrelated to that problem, but related to gsub() is that I can't find
>> a way for gsub() to interpret the backslash as a character. In regular
>> expression, \\ should represent "the character \", but gsub() doesn't:
>>
>>> data.frame(animals=c("dog","wolf","cat"))->my.data
>>> gsub("d","\\",my.data$animals)
>>
>> [1] "og"   "wolf" "cat"
>>
>> Thank you
>> Sverre
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> --
> Brian D. Ripley,                  ripley at stats.ox.ac.uk
> Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
> University of Oxford,             Tel:  +44 1865 272861 (self)
> 1 South Parks Road,                     +44 1865 272866 (PA)
> Oxford OX1 3TG, UK                Fax:  +44 1865 272595



More information about the R-help mailing list