[R] Writing Unicode Text into Text File from R (in Windows)

Majid Einian einian85 at gmail.com
Wed Feb 19 08:03:00 CET 2014


On Tue, Feb 4, 2014 at 4:18 PM, Duncan Murdoch <murdoch.duncan at gmail.com> wrote:
>
> On 14-02-04 5:49 AM, Majid Einian wrote:
>>
>> Dear R Helpers,
>>
>> See the Code:
>>
>> a <- intToUtf8(1777)
>> show(a)
>> zz <- file(description="test.txt",open="w",encoding="UTF-8")
>> cat(a, file = zz)
>> close(zz)
>>
>> in a Unicode aware environment (such as RGui console or RStudio Console)
>> you will see this as output:
>>
>> [1] "۱"
>>
>>
>> but the character is not written correctly in the file test.txt (which is
>> encoded in UTF-8 without BOM) :
>>
>> <U+06F1>
>>
>> The problem seems to be this: R changes text to the locale of system (for
>> me this is Arabic Windows (Codepage 1256) that does not have a relevant
>> code for U+06F1, then changes it back to UTF-8 and writes it into file.
>> What do I miss here?
>>   How can I write a Unicode string into a text file correctly?
>
>
> There are a lot of places in R where it converts strings to the local encoding, perhaps too many. On the other hand, maybe Windows should be offering UTF-8 locales by now.

I would like to see that happen too! I have no such problem on Linux.

>
> I haven't tested in your locale, but I believe writeLines() to a connection declared to be in a UTF-8 encoding will maintain the encoding.

writeLines() does change the encoding to system encoding and then back
to unicode just like cat().

>  You can declare a file to be in encoding "UTF-8-BOM" if you want to ignore a BOM on input; I forget whether it will write one on output.  If it doesn't, you can always write one explicitly.
>

I have no problem with BOM being there or not.

> I was hoping to make some progress on this before R 3.1.0 so that more cases of writing strings to UTF-8 files would work, but time is running out.

I hope we see this happen soon :)

Majid Einian

>
> Duncan Murdoch
>
>>
>>
>> Majid Einian,
>> Economics Researcher, Monetary and Banking Research Institute, Central Bank
>> of Islamic Republic of Iran, Tehran, IRAN
>> and
>> PhD Candidate in "Economics", Graduate School of Management and
>> Economics, Sharif University of Technology, Tehran, IRAN
>>
>>         [[alternative HTML version deleted]]
>>
>>
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>




More information about the R-help mailing list