[R] Encoding() and strsplit()
Prof Brian Ripley
ripley at stats.ox.ac.uk
Fri Nov 7 09:15:55 CET 2008
See the 'R Internals' manual.
ASCII characters are not marked as Latin-1 nor UTF-8.
On Fri, 7 Nov 2008, Heinz Tuechler wrote:
> Dear All,
>
> Encoding() goes beyond my understanding. See the example. I would expect from
> reading the help for Encoding() that strsplit preserves the encoding for each
> resulting element, but for simple letters it gets lost.
> Also it seems that an Encoding() cannot be declared for simple letters. They
> remain in any case "unknown". In paste() "latin1" seems to dominate
> "unknown".
> What kind of characteristic of an object is the encoding? It does not show up
> as attribute and also str() does not give me any hint.
> Where can I find some explanation regarding encoding?
>
> Thanks
>
> Heinz
>
> ### Encoding() and strsplit
> u <- 'abcäöü'
> Encoding(u)
> [1] "latin1"
> Encoding(u) <- 'latin1' # to be sure about encoding
> us <- strsplit(u, '')[[1]] # split in single strings
> Encoding(us)
> [1] "unknown" "unknown" "unknown" "latin1" "latin1" "latin1"
> Encoding(us) <- rep('latin1', length(us))
> Encoding(us)
> [1] "unknown" "unknown" "unknown" "latin1" "latin1" "latin1"
> pus <- paste(us[1], us[5], sep='')
> Encoding(pus)
> [1] "latin1"
>
> Version:
> platform = i386-pc-mingw32
> arch = i386
> os = mingw32
> system = i386, mingw32
> status = Patched
> major = 2
> minor = 8.0
> year = 2008
> month = 11
> day = 04
> svn rev = 46830
> language = R
> version.string = R version 2.8.0 Patched (2008-11-04 r46830)
>
> Windows XP (build 2600) Service Pack 2
>
> Locale:
> LC_COLLATE=German_Austria.1252;LC_CTYPE=German_Austria.1252;LC_MONETARY=German_Austria.1252;LC_NUMERIC=C;LC_TIME=German_Austria.1252
>
> Search Path:
> .GlobalEnv, package:stats, package:graphics, package:grDevices,
> package:utils, package:datasets, package:methods, Autoloads, package:base
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
--
Brian D. Ripley, ripley at stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595
More information about the R-help
mailing list