[R] regex question

Chuck Taylor ctaylor at tibco.com
Wed Aug 5 02:30:41 CEST 2009


Here is a third way to do this, but it doesn't make use of regular expressions per se:

> avec <- unlist(strsplit(astr, ""))  # First convert astr to a vector
> avec[c(1, 1 + grep(" ", avec))]
[1] "T" "i" "m" "t" "o" "w" "t" "d" "i"

This latter expression subscripts avec by concatenating the first position, and 1 + the position of each blank in the character vector.

Here is yet a fourth way that does use a regular expression:

> avec[unlist(gregexpr("\\<[[:alpha:]]", astr))]  # avec from above
[1] "T" "i" "m" "t" "o" "w" "t" "d" "i"

The components of this regular expression can be broken down as follows:

"\\<"           The empty string at the beginning of a word.
                R requires the extra backslash.
"[[:alpha:]]"   Any alphabetic character, upper or lower case

gregexpr() returns a list; unlist() converts the list to a vector, each element of which points to the first character of a word in astr. That result can be used to subscript avec.

Best regards,
Chuck Taylor
TIBCO Spotfire
Seattle, WA, USA

-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of ravi
Sent: Tuesday, August 04, 2009 10:28 AM
To: r-help at r-project.org
Subject: [R] regex question

I am getting stuck over an apparently simple problem in the use of regular expressions :
To collect together the first letters of the words from the Perl motto, “There is more than one way to do it” in the following form – TIMTOWTDI. 
I tried the following code :
##### A regex problem with the Perl motto
astr<-"There is more than one way to do it"
b1<-grep("\\<", astr,value=T)
## This just retrieves  the whole string
## Next trial with gregexpr
## This gives  :
> b3
[1]  1  7 10 15 20 24 28 31 34
[1] 0 0 0 0 0 0 0 0 0
A vector of indices corresponding to the first letter is obtained all right with gregexpr but the next step is not so clear. I am not able to figure out how I can use this information to pick out the letters from the original string. My problem is that I don’t know how I can treat the string as a vector and pluck out the letters.
There may be many ways to do it, but I have not succeeded in coming up with even one way! I will appreciate any tips that I can get.
Thanking you,

R-help at r-project.org mailing list
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

More information about the R-help mailing list