[R] questions on French characters in plot
Milan Bouchet-Valat
nalimilan at club.fr
Tue Dec 11 16:58:00 CET 2012
Le mardi 11 décembre 2012 à 16:41 +0100, Richard Zijdeman a écrit :
> Dear Milan,
>
> thank you for kind suggestion. Converting the characters using:
> > iconv(department, "ISO-8859-15", "UTF-8")
> indeed improves the situation in that now all values (names of
> departments) are displayed in the plot, although the specific special
> characters are unfortunately appearing as empty boxes.
I wouldn't call that an improvement... :-/
What's the result of the other one, i.e.
iconv(department, "UTF-16", "UTF-8")
> I have tried to see whether I could 'save' my state file using UTF-8
> format, and although this proves to be a popular request it does not
> seem to have been incorporated in Stata.
You should not need this. iconv() should be able to convert the strings
to something usable. The problem is to determine what's the original
encoding. Could you call
lapply(department, charToRaw)
and post the output?
Regards
> Best and thank you for your help,
>
> Richard
>
>
> On 11 Dec 2012, at 12:11, Milan Bouchet-Valat <nalimilan at club.fr> wrote:
>
> > Le mardi 11 décembre 2012 à 01:10 +0100, Richard Zijdeman a écrit :
> >> Dear all,
> >>
> >> I have imported a dataset from Stata using the foreign package. The
> >> original data contain French characters such as and .
> >> After importing, string variables containing names of French
> >> departments have changed. E.g. Ardche became Ard\x8fche. I would like
> >> to ask how I could plot these changed strings, since now the strings
> >> with special characters fail to be printed in the plot (either using
> >> plot() or ggplot2()).
> >>
> >> I have googled for solutions, but actually find it hard to determine
> >> whether I should change my R setup or should read in the data in a
> >> different way. Since I work on a mac I changed my local according to
> >> the R for Mac OS X FAQ, chapter 9. Below is some info on my setup and
> >> code and output on what works for me and what does not. Thank you in
> >> advance for you comments.
> > Accentuated characters should work fine on a machine using a UTF-8
> > locale as yours. I think the problem is that the imported data uses
> > ISO8859-15 or UTF-16, not UTF-8.
> >
> > I have no idea whether .dta files specify an encoding or not, but I
> > think you can convert them in R by calling
> > iconv(department, "ISO-8859-15", "UTF-8")
> > or
> > iconv(department, "UTF-16", "UTF-8")
> >
> >> Best,
> >>
> >> Richard
> >>
> >> #--------------
> >> rm(list=ls())
> >> sessionInfo()
> >> # R version 2.15.2 (2012-10-26)
> >> # Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
> >> #
> >> # locale:
> >> # [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
> >>
> >> # creating variables
> >> department <- c("Nord","Paris","Ard\x8fche")
> > \x8 does not correspond to "è" AFAIK. In ISO8859-1 and -15 and UTF-16,
> > it's \xE8 ("\uE8" in R).
> >
> > In UTF-8, it's C3 A8, "\303\250" in R.
> >
> >> department2 <- c("Nord", "Paris", "Ardche")
> >> n <- c(2,4,1)
> >>
> >> # creating dataframes
> >> df <- data.frame(department,n)
> >> df2 <- data.frame(department2,n)
> >>
> >> department
> >> # [1] "Nord" "Paris" "Ard\x8fche"
> >> department2
> >> # [1] "Nord" "Paris" "Ardche"
> >>
> >> plot(df) # fails to show the text "Ardche"
> >> plot(df2) # shows text "Ardche"
> >>
> >> # EOF
> >> [[alternative HTML version deleted]]
> >>
> >> ______________________________________________
> >> R-help at r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide
> >> http://www.R-project.org/posting-guide.html
> >> and provide commented, minimal, self-contained, reproducible code.
> >
More information about the R-help
mailing list