[R] Regular expression to define contents between parentheses

Bert Gunter gunter.berton at gene.com
Tue Aug 25 22:41:46 CEST 2009


I believe that this is, indeed, tough; it might require PERL regex's to do
entirely within the regular expression language. You might also wish to
check out the gsubfn package to see if it could help.

However, a reasonably simple alternative approach that I think will work is
to use strsplit():

1. Split on "("
2. lapply on the resulting list of vectors and remove all elements from each
vector that contain a ")" using, e.g. grep().
3. sapply paste() on the now "cleaned" list to get back the cleaned up

I leave it to you to work out details -- or point out why I'm wrong.
Alternatively, wait for someone smarter to reply -- which I'm sure will
occur given the clarity with which you posed your problem.


Bert Gunter
Genentech Nonclinical Biostatisics

-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On
Behalf Of Judith Flores
Sent: Tuesday, August 25, 2009 1:18 PM
To: RHelp
Subject: [R] Regular expression to define contents between parentheses

Hello dear R-helpers,

   I haven't been able to figure out of find a solution in the R-help
archives about how to delete all the characters contained in groups of
parenthesis. I have a vector that looks more or less like this:

myvector<-c("something (80 km/h, sd) & more (6 kg/L,sd)", "somethingelse (48
m/s, sd) & moretoo (50g/L , sd)")

I want to extract all the strings that are not contained in parenthesis, the
goal would be to obtain the following new vector:

subvector<-c("something & more", "somethingelse & moretoo")

I tried the following, but this pattern seems to enclose all that is
included between the first opened parenthesis and the last closed
parethesis, which makes sense, but it's not what I need:


Your help will be very appreciated.

Thank you,


R-help at r-project.org mailing list
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

More information about the R-help mailing list