[Rd] R 3.5.3 and 3.6.0 alpha Windows bug: UTF-8 characters in code are simplified to wrong ones

Thu Apr 11 08:10:53 CEST 2019

On 4/10/19 6:13 PM, Tomáš Bořil wrote:

> An optional parameter to source() function which would translate all
> UTF-8 characters in string literals to their "\Uxxxx" codes sounds as
> a great idea (and I hope it would fix 99.9% of problems I have -
> because that is the way I overcome these problems nowadays) - and the
> same behaviour in command line...

I was not suggesting to convert to \Uxxxx in source(). Some users do it 
in their programs by hand or an external utility. Source() in principle 
could be made work similarly to eval(parse(file,encoding=)) with respect 
to encodings, via other means, we will consider that but there are many 
remaining places where the conversion happens - a trivial one is that 
currently you cannot print the result of the parse() from your example 
properly. Maybe you don't trigger such problems in your scripts in 
obvious ways, but as I said before, if you want to work reliably with 
characters not representable in current native encoding, in current or 
near version of R, use Linux or macOS.

Tomas

>
> Tomas