[R] Do grep() and strsplit() use different regex engines?

Charles C. Berry ccberry at ucsd.edu
Sun Jul 12 01:26:21 CEST 2015


On Sat, 11 Jul 2015, Bert Gunter wrote:

> David/Jeff:
>
> Thank you both.
>
> You seem to confirm that my observation of an "infelicity" in
> strsplit() is real. That is most helpful.
>
> I found nothing in David's message 2 code that was surprising. That
> is, the splits shown conform to what I would expect from "\\b" . But
> not to what I originally showed and David enlarged upon in his first
> message. I still don't really get why a split should occur at every
> letter.
>
> Jeff may very well have found the explanation, but I have not gone
> through his code.
>
> If the infelicities noted (are there more?) by David and me are not
> really bugs -- and I would be frankly surprised if they were -- I
> would suggest that perhaps they deserve mention in the strsplit() man
> page. Something to the effect that "\b and \< should not be used as
> split characters..." .

Bert et al,

?strsplit already says:

"If empty matches occur, in particular if split has length 0, x is split 
into single characters."

And there are various ways that empty matches can happen besides using 
"\\b" as the split arg. But there would be no harm in adding your cases to 
'in particular ...'

The comment in the code (src/main/grep.c: line 493) suggests this was a 
deliberate decision. However, similar functions in other languages do not 
do this.

For example, emacs `(split-string "red green" "\\b")' gives

 	("" "red" " " "green" "")

as the result.

Chuck



More information about the R-help mailing list