[Rd] R 3.5.3 and 3.6.0 alpha Windows bug: UTF-8 characters in code are simplified to wrong ones
Duncan Murdoch
murdoch@dunc@n @end|ng |rom gm@||@com
Wed Apr 10 18:46:20 CEST 2019
On 10/04/2019 12:32 p.m., Jeroen Ooms wrote:
> On Wed, Apr 10, 2019 at 5:45 PM Duncan Murdoch <murdoch.duncan using gmail.com> wrote:
>>
>> On 10/04/2019 10:29 a.m., Yihui Xie wrote:
>>> Since it is "technically easy" to disable the best fit conversion and
>>> the best fit is rarely good, how about providing an option for
>>> code/package authors to disable it? I'm asking because this is one of
>>> the most painful issues in packages that may need to source() code
>>> containing UTF-8 characters that are not representable in the Windows
>>> native encoding. Examples include knitr/rmarkdown and shiny. Basically
>>> users won't be able to knit documents or run Shiny apps correctly when
>>> the code contains characters that cannot be represented in the native
>>> encoding.
>>
>> Wouldn't things be worse with it disabled than currently? I'd expect
>> the line containing the "ř" to end up as NA instead of converting to "r".
>
> I don't think it would be worse, because in this case R would not
> implicitly convert strings to (best fit) latin1 on Windows, but
> instead keep the (correct) string in its UTF-8 encoding. The NA only
> appears if the user explicitly forces a conversion to latin1, which is
> not the problem here I think.
>
> The original problem that I can reproduce in RGui is that if you enter
> "ř" in RGui, R opportunistically converts this to latin1, because it
> can. However if you enter text which can definitely not be represented
> in latin1, R encodes the string correctly in UTF-8 form.
>
I think the pathways for text in RGui and text being sourced are
different. I agree fixing RGui in that way would make sense, but Yihui
was talking about source().
Duncan Murdoch
More information about the R-devel
mailing list