[R-pkg-devel] UTF-8 and raw strings in package code
Mark Bravington
m@rk|nr @end|ng |rom @ummer|n@outh@net
Sun Nov 30 00:57:01 CET 2025
> Wouldn't the obvious thing be to not use an r string here?
"Obvious" in terms of keeping RCMD CHECK happy, certainly, but it'd be antithetical to clear code--- the string I included in the post would become incomprehensible to the maintainer (me).
IME raw strings in R are under-appreciated and little-known. They have lots of uses besides regexes, whatever the intention(s) may or may not have been! EG I use raw strings for formatted multi-line comments, and documentation, and templated bits of text. Nicer code results.
Anyway, I'd be perfectly happy with Duncan Murdoch's suggestion of making UTF-8 legit in R & NAMESPACE generally. I suggested the minor incremental change of "only raw strings" (i) because that's the only thing that affects me ATM, and (ii) just in case there were unwelcome implications of UTF-8 for (iii) strings in general, or (iv) legal variable names etc.
cheers
Mark
On Sun, Nov 30, 2025, at 03:44, Jeff Newmiller via R-package-devel wrote:
> Wouldn't the obvious thing be to not use an r string here? Using r strings does not imply the use of non-ascii characters (AFAIK they are intended for regex patterns), and using regular strings does not imply you cannot use Unicode (with \uxxxx).
>
> At some point I would think that accepting Unicode in package source code would become acceptable... but supporting Unicode in data objects does not implicitly suggest that allowing Unicode in source code has to be supported so your arguments don't IMO really bring any weight to the discussion.
>
> On November 29, 2025 2:55:52 AM PST, Mark Bravington <markinr using summerinsouth.net> wrote:
> >Hi--- My package 'lyxport' has R code with several raw strings (see ?Quotes) which contain UTF-8 characters (FWIW: in order to deal with wacky legacy Latex characters). For example, one of the strings is:
> >
> > converto <- r"--{
> > Ä \"A ä \"a Á \'A á \'a Ȧ \.A ȧ \.a Ā \=A
> > ā \=a  \^A â \^a À \`A à \`a Ą \k{A} ą \k{a}
> ><snipped>
> > Ŋ {\NG} Ø {\O} ø {\o} œ {\oe} Œ {\OE} ß {\ss} þ {\th}
> > Þ {\TH}
> > }--"
> >
> >RCMD CHECK is not happy, and gives a Warning:
> >
> >"Portable packages must use only ASCII characters in their R code and NAMESPACE directives, except perhaps in comments. Use \uxxxx escapes for other characters."
> >
> >and indeed that is as stated in "Writing R extensions", section 1.1.5 ("Package subdirectories") and section 1.6.3, "Encoding issues".
> >
> >But I wonder if this is still sensible now that
> >
> >(i) R has raw strings (since ~R 4.0);
> >(ii) the DESCRIPTION file explicitly says "Encoding: UTF-8"; and
> >(iii) R >= 4.2 pretty much now enforces UTF-8 in Windows (and UTF-8 could even be a "requirement" of this package, if that helped).
> >
> >With "normal" strings then maybe the \uxxxx thing is reasonable; but shouldn't the contents of raw strings be exempt? You can't put \uxxxx into a raw string, for obvious reasons...
> >
> >cheers
> >Mark
> >
> >
> >PS Of course, there are ways around the Warning (eg storing the strings as files elsewhere in the package, and reading those files during the code) but they are tedious, harder to maintain, and reduce clarity (imagine using \uxxxx in the above!). Since I don't particularly care whether the package goes on CRAN or not (it's living quite happily in R-universe), I've no plans to change my code, but I would prefer to avoid Warnings that then have to be explained to would-be users. And I am probably not the only person affected.
> >
> >PPS The package has been working fine on Windows, Macs, and Linux.
> >
> >______________________________________________
> >R-package-devel using r-project.org mailing list
> >https://stat.ethz.ch/mailman/listinfo/r-package-devel
>
> --
> Sent from my phone. Please excuse my brevity.
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-package-devel using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-package-devel
>
More information about the R-package-devel
mailing list