[R] questions on French characters in plot
Richard Zijdeman
richard.zijdeman at me.com
Tue Dec 11 16:41:24 CET 2012
Dear Milan,
thank you for kind suggestion. Converting the characters using:
> iconv(department, "ISO-8859-15", "UTF-8")
indeed improves the situation in that now all values (names of departments) are displayed in the plot, although the specific special characters are unfortunately appearing as empty boxes.
I have tried to see whether I could 'save' my state file using UTF-8 format, and although this proves to be a popular request it does not seem to have been incorporated in Stata.
Best and thank you for your help,
Richard
On 11 Dec 2012, at 12:11, Milan Bouchet-Valat <nalimilan at club.fr> wrote:
> Le mardi 11 décembre 2012 à 01:10 +0100, Richard Zijdeman a écrit :
>> Dear all,
>>
>> I have imported a dataset from Stata using the foreign package. The
>> original data contain French characters such as and .
>> After importing, string variables containing names of French
>> departments have changed. E.g. Ardche became Ard\x8fche. I would like
>> to ask how I could plot these changed strings, since now the strings
>> with special characters fail to be printed in the plot (either using
>> plot() or ggplot2()).
>>
>> I have googled for solutions, but actually find it hard to determine
>> whether I should change my R setup or should read in the data in a
>> different way. Since I work on a mac I changed my local according to
>> the R for Mac OS X FAQ, chapter 9. Below is some info on my setup and
>> code and output on what works for me and what does not. Thank you in
>> advance for you comments.
> Accentuated characters should work fine on a machine using a UTF-8
> locale as yours. I think the problem is that the imported data uses
> ISO8859-15 or UTF-16, not UTF-8.
>
> I have no idea whether .dta files specify an encoding or not, but I
> think you can convert them in R by calling
> iconv(department, "ISO-8859-15", "UTF-8")
> or
> iconv(department, "UTF-16", "UTF-8")
>
>> Best,
>>
>> Richard
>>
>> #--------------
>> rm(list=ls())
>> sessionInfo()
>> # R version 2.15.2 (2012-10-26)
>> # Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
>> #
>> # locale:
>> # [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
>>
>> # creating variables
>> department <- c("Nord","Paris","Ard\x8fche")
> \x8 does not correspond to "è" AFAIK. In ISO8859-1 and -15 and UTF-16,
> it's \xE8 ("\uE8" in R).
>
> In UTF-8, it's C3 A8, "\303\250" in R.
>
>> department2 <- c("Nord", "Paris", "Ardche")
>> n <- c(2,4,1)
>>
>> # creating dataframes
>> df <- data.frame(department,n)
>> df2 <- data.frame(department2,n)
>>
>> department
>> # [1] "Nord" "Paris" "Ard\x8fche"
>> department2
>> # [1] "Nord" "Paris" "Ardche"
>>
>> plot(df) # fails to show the text "Ardche"
>> plot(df2) # shows text "Ardche"
>>
>> # EOF
>> [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
More information about the R-help
mailing list