[R] encoding accentsand tildes in R Macosx
Prof Brian Ripley
ripley at stats.ox.ac.uk
Mon Aug 11 09:31:25 CEST 2008
On Mon, 11 Aug 2008, Kenneth Roy Cabrera Torres wrote:
> Hi Carlos:
>
> I think you got a encoding problem.
> Maybe is esier to convert it.
>
> I don't know how to convert in Mac OS, but
> in linux you can use "iconv" that converts many codes
> to other.
Well, R has an iconv() command even on Mac OS X, and my iMac has 'iconv'
as a command-line program. But you need to know what to convert from and
to.
> Is the original file form a windos$ OS system?
> Maybe the encoding is in windows-1256 and you need
> to convert to a compatible MAC enconding.
Hmm, in latin1 (the most plausible Windows encoding) \x92 is a quote and
\x96 is an en dash. 1256 is Arabic.
I think this is a MAC encoding, an obsolete one (Mac OS X in the main uses
UTF-8). Try encoding="macroman".
However, if you read ?read.table, you will see that *its* encoding
argument does not re-encode. You want
con <- file(<filename>, encoding="macroman")
tmp <- read.table(con, ...)
close(file)
There's an example on ?file (as 'encoding' in ?read.table says).
>
> Hope this helps.
>
> Kennneth
> El dom, 10-08-2008 a las 22:14 -0700, Carlos Cuartas escribió:
>> Hello,
>> In R under Mac OS X 10.5.4 I've had problems when I've tried to read a
>> data.frame with characters including tildes and accents. For instance
>> Florea is changed to Flore\x96a and Ranchera is changed to Rancher\x92a
>> In the code:
>> section<-read.table('Sectiondic.txt',sep='\t',header=T,stringsAsFactors=F,encoding="
>> ") I've changed the "encoding" argument but I have not could find the
>> solution.
>> Any suggestion?
>>
>> Thanks a lot
>>
>> Carlos Cuartas
--
Brian D. Ripley, ripley at stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595
More information about the R-help
mailing list