[R] regexec: Unexpected answer when matching digits
Prof Brian Ripley
ripley at stats.ox.ac.uk
Mon May 5 07:04:00 CEST 2014
On 05/05/2014 00:26, Stephen Sentoff wrote:
> Here is my sessionInfo from the linux machine where I see this behavior.
>
> R version 3.1.0 (2014-04-10)
> Platform: x86_64-pc-linux-gnu (64-bit)
>
> locale:
> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8
> [4] LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8
> LC_MESSAGES=en_US.UTF-8
> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C
> [10] LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8
> LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats graphics grDevices utils datasets methods base
>
> loaded via a namespace (and not attached):
> [1] tools_3.1.0
It is a known bug in the TRE engine, PR14408. It needs a UTF-8 locale
and a range of repeat modifiers (here {2,}), and can be worked around by
using perl=TRUE where supported (I do not know why the author of regexec
did not support it).
--
Brian D. Ripley, ripley at stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595
More information about the R-help
mailing list