[R] Encoding() and strsplit()

Prof Brian Ripley ripley at stats.ox.ac.uk
Fri Nov 7 09:15:55 CET 2008


See the 'R Internals' manual.

ASCII characters are not marked as Latin-1 nor UTF-8.

On Fri, 7 Nov 2008, Heinz Tuechler wrote:

> Dear All,
>
> Encoding() goes beyond my understanding. See the example. I would expect from 
> reading the help for Encoding() that strsplit preserves the encoding for each 
> resulting element, but for simple letters it gets lost.
> Also it seems that an Encoding() cannot be declared for simple letters. They 
> remain in any case "unknown". In paste() "latin1" seems to dominate 
> "unknown".
> What kind of characteristic of an object is the encoding? It does not show up 
> as attribute and also str() does not give me any hint.
> Where can I find some explanation regarding encoding?
>
> Thanks
>
> Heinz
>
> ###   Encoding() and strsplit
> u <- 'abcäöü'
> Encoding(u)
> [1] "latin1"
> Encoding(u) <- 'latin1' # to be sure about encoding
> us <- strsplit(u, '')[[1]] # split in single strings
> Encoding(us)
> [1] "unknown" "unknown" "unknown" "latin1"  "latin1"  "latin1"
> Encoding(us) <- rep('latin1', length(us))
> Encoding(us)
> [1] "unknown" "unknown" "unknown" "latin1"  "latin1"  "latin1"
> pus <- paste(us[1], us[5], sep='')
> Encoding(pus)
> [1] "latin1"
>
> Version:
> platform = i386-pc-mingw32
> arch = i386
> os = mingw32
> system = i386, mingw32
> status = Patched
> major = 2
> minor = 8.0
> year = 2008
> month = 11
> day = 04
> svn rev = 46830
> language = R
> version.string = R version 2.8.0 Patched (2008-11-04 r46830)
>
> Windows XP (build 2600) Service Pack 2
>
> Locale:
> LC_COLLATE=German_Austria.1252;LC_CTYPE=German_Austria.1252;LC_MONETARY=German_Austria.1252;LC_NUMERIC=C;LC_TIME=German_Austria.1252
>
> Search Path:
> .GlobalEnv, package:stats, package:graphics, package:grDevices, 
> package:utils, package:datasets, package:methods, Autoloads, package:base
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595


More information about the R-help mailing list