[R] Export Unicode characters from R
Sverre Stausland
johnsen at fas.harvard.edu
Fri Jul 15 23:36:18 CEST 2011
Hi,
I'm interested in the suggestion to use writeLines( ...,
useBytes=TRUE), but how can I use this function on the way to
exporting from R? Could you please provide a simple example?
The following suggestion worked very well:
> funny.g<- "\u1E21"
> rawstuff<- charToRaw(funny.g)
> writeBin(rawstuff, "funny.g.txt")
But the function charToRaw() only allows an object with a single
character, and writeBin cannot be used to export data frames. Is there
any solution along these lines when I have a data frame with Unicode
characters?
Best
Sverre
On Fri, Jul 15, 2011 at 2:38 PM, Duncan Murdoch
<murdoch.duncan at gmail.com> wrote:
> On 15/07/2011 1:42 PM, Sverre Stausland wrote:
>>
>> >>>
>> >>> > funny.g<- "\u1E21"
>> >>> > funny.g
>> >>
>> >> [1] "ḡ"
>> >>
>> >>> > data.frame (funny.g) -> funny.g
>> >>> > funny.g$funny.g
>> >>
>> >> [1] ḡ
>> >> Levels:<U+1E21>
>> >
>> > I think the problem is in the data.frame code, not in writing.
>> > Data.frames
>> > try to display things in a readable way, and since you're on Windows
>> > where
>> > UTF-8 is not really supported, the code helpfully changes that
>> > character to
>> > the "<U+1E21>" string. for display.
>>
>> I thought the data.frame function didn't alter the unicode coding,
>> since funny.g$funny.g above still displays the right unicode character
>> (although it does list the levels as<U+1E21>).
>>
>> > You should be able to write the Unicode character to file if you use
>> > lower
>> > level methods such as cat(), on a connection opened using the file()
>> > function with the encoding set explicitly.
>>
>> I'm sorry, but I don't understand what it means "to use cat() on a
>> connection opened using the file() function". Could you please clarify
>> that?
>>
>
> I just checked on how R does it. We use UTF-8 encodings in the help pages,
> regardless of what kind of system you're running on.
>
> It converts the strings to UTF-8 internally first (your funny.g is already
> encoded that way; see Encoding(funny.g)) then uses
>
> writeLines( ..., useBytes=TRUE)
>
> to write it. The useBytes argument says not to try to make the file
> readable on the local system, just write out the bytes.
>
> Another way to do it is to get your strings in the UTF-8 encoding, convert
> them to raw vectors, and use writeBin() to write those out. For example,
>
> funny.g<- "\u1E21"
> rawstuff<- charToRaw(funny.g)
> writeBin(rawstuff, "funny.g.txt")
>
>
> All of this appears hard, because you're thinking of UTF-8 as text, but on
> Windows, R thinks of it as a binary encoding. Modern Windows systems can
> handle UTF-8, but not all programs on them can.
>
> Duncan Murdoch
>
>
More information about the R-help
mailing list