[R] Regular Expression returning unexpected results

Jeff Newmiller jdnewmil at dcn.davis.CA.us
Tue Oct 29 19:08:14 CET 2013


Please read and follow the Posting Guide, in particular re plain text email.

You need to keep in mind that the characters in literal strings in R source have to make it into RAM before the regex code can parse it. Since regex needs a single backslash to escape normal parsing and interpret 1 as a back reference, but the R parser also recognizes and removes backslashes in string literals as escape characters, you need to escape the backslash with a backslash in your R string literal. 

nchar tells you how many characters are in the string. print renders the string as it would need to be entered as R source code. cat sends the string directly to the output (console). Study the output of the following commands at the R prompt.

?Quotes

nchar("^([a-z]+) +\1 +[a-z]+ [0-9]")
print("^([a-z]+) +\1 +[a-z]+ [0-9]")
cat("^([a-z]+) +\1 +[a-z]+ [0-9]")

On most systems, a raw character code 1 is also known as Control-A, but the effect it has on the terminal used as the console may vary according to your setup, and it's effect on my system is  not clear to me.

nchar("^([a-z]+) +\\1 +[a-z]+ [0-9]")
print("^([a-z]+) +\\1 +[a-z]+ [0-9]")
cat("^([a-z]+) +\\1 +[a-z]+ [0-9]")
grep("^([a-z]+) +\\1 +[a-z]+ [0-9]",lines)

---------------------------------------------------------------------------
Jeff Newmiller                        The     .....       .....  Go Live...
DCN:<jdnewmil at dcn.davis.ca.us>        Basics: ##.#.       ##.#.  Live Go...
                                      Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
/Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k
--------------------------------------------------------------------------- 
Sent from my phone. Please excuse my brevity.

"Lopez, Dan" <lopez235 at llnl.gov> wrote:
>Hi,
>
>So I just took an intro to R programming class and one of the lectures
>was on Regular Expressions. I've been playing around with various R
>functions that use Regular Expressions.
>But this has me stumped. This was part of a quiz and I got it right
>through understanding the syntax. But when I try to run the thing it
>returns 'integer(0)'. Can you please tell me what I am doing wrong?
>
>#I copied and pasted this:
>going up and up and up
>night night at 8
>bye bye from up high
>heading, heading by 9
>
>#THEN
>lines<-readLines("clipboard")
>#This is what it looks like in R
>lines
>[1] "going up and up and up"
>[2] "night night at 8"
>[3] "bye bye from up high"
>[4] "heading, heading by 9"
>
>#THIS IS WHAT IS NOT WORKING THE WAY I THOUGHT. I was expecting it to
>return 2.
># "night night at 8" follows the pattern: Begins with a word then has
>at least one space then the same word then has at least one space then
>a word then a space then a single digit number.
>grep("^([a-z]+) +\1 +[a-z]+ [0-9]",lines)
>integer(0)
>
>#But simple examples DO work
>grep("[Hh]",lines)
>[1] 2 3 4
>grep('[0-9]',lines)
>[1] 2 4
>
>	[[alternative HTML version deleted]]
>
>______________________________________________
>R-help at r-project.org mailing list
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list