[R] question about string handling....
Gabor Grothendieck
ggrothendieck at gmail.com
Wed Jul 14 22:08:05 CEST 2010
On Wed, Jul 14, 2010 at 2:21 PM, karena <dr.jzhou at gmail.com> wrote:
>
> Hi,
>
> I have a data.frame as following:
> var1 var2
> 1 ab_c_(ok)
> 2 okf789(db)_c
> 3 jojfiod(90).gt
> 4 "ij"_(78)__op
> 5 (iojfodjfo)_ab
>
> what I want is to create a new variable called "var3". the value of var3 is
> the content in the Parentheses. so var3 would be:
> var3
> ok
> db
> 90
> 78
> iojfodjfo
>
Here are several alternatives. The gsub solution matches everything
up to the ( as well as everything after the ) and replaces each with
nothing. The strsplit solution splits each into three fields,
everything before the (, everything with in the (), and everything
after the ) and the picks off the second. The strapply solution
matches everything from ( to ) and returns everything between them.
The below works whether DF$var2 is factor or character but if you know
its character you can drop the as.character in #2 and #3.
# 1
gsub(".*[(]|[)].*", "", DF$var2)
# 2
sapply(strsplit(as.character(DF$var2), "[()]"), "[", 2)
# 3
library(gsubfn)
strapply(as.character(DF$var2), "[(](.*)[)]", simplify = TRUE)
More information about the R-help
mailing list