[R] regex question
Chuck Taylor
ctaylor at tibco.com
Wed Aug 5 02:30:41 CEST 2009
Ravi,
Here is a third way to do this, but it doesn't make use of regular expressions per se:
> avec <- unlist(strsplit(astr, "")) # First convert astr to a vector
> avec[c(1, 1 + grep(" ", avec))]
[1] "T" "i" "m" "t" "o" "w" "t" "d" "i"
This latter expression subscripts avec by concatenating the first position, and 1 + the position of each blank in the character vector.
Here is yet a fourth way that does use a regular expression:
> avec[unlist(gregexpr("\\<[[:alpha:]]", astr))] # avec from above
[1] "T" "i" "m" "t" "o" "w" "t" "d" "i"
The components of this regular expression can be broken down as follows:
"\\<" The empty string at the beginning of a word.
R requires the extra backslash.
"[[:alpha:]]" Any alphabetic character, upper or lower case
gregexpr() returns a list; unlist() converts the list to a vector, each element of which points to the first character of a word in astr. That result can be used to subscript avec.
Best regards,
Chuck Taylor
TIBCO Spotfire
Seattle, WA, USA
-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of ravi
Sent: Tuesday, August 04, 2009 10:28 AM
To: r-help at r-project.org
Subject: [R] regex question
Hi,
I am getting stuck over an apparently simple problem in the use of regular expressions :
To collect together the first letters of the words from the Perl motto, “There is more than one way to do it” in the following form – TIMTOWTDI.
I tried the following code :
##### A regex problem with the Perl motto
astr<-"There is more than one way to do it"
b1<-grep("\\<", astr,value=T)
## This just retrieves the whole string
## Next trial with gregexpr
b2<-gregexpr("\\<",astr)
## This gives :
> b3
[[1]]
[1] 1 7 10 15 20 24 28 31 34
attr(,"match.length")
[1] 0 0 0 0 0 0 0 0 0
A vector of indices corresponding to the first letter is obtained all right with gregexpr but the next step is not so clear. I am not able to figure out how I can use this information to pick out the letters from the original string. My problem is that I don’t know how I can treat the string as a vector and pluck out the letters.
There may be many ways to do it, but I have not succeeded in coming up with even one way! I will appreciate any tips that I can get.
Thanking you,
Ravi
______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list