Bob O'Hara rni.boh at gmail.com
Wed Mar 11 11:31:30 CET 2015


I'm trying to persuade R's regular expressions to do what I want. I
have a vector of strings which are names of variables, some of which
are elements of strings. I want to reformat all of the variables into
a list, so (for example)  beta[1] and beta[2] would be a vector in the
list. Where I'm struggling is how to pick out the correct variables.

The problem is that if I have a sub-string, str, then I want to find
the strings that is either the same as the sub-string, or is the
substring followed by a '['. I feel I should be able to do this within
a character class if I could give it an end of string character, i.e.
'[$\\[]' where $ is not a literal $, but the end of the string (i.e.
how it's interpreted outside a character class)

Here's an example, using $ where I want the end of string:

> VarNames <- c("alpha", "beta[1]", "beta[2]", "m", "mu.k", "mu.r")
> TryNames <- unique(gsub('[]\\[1-9]',"",VarNames))
> VarNames[grep(paste('^',TryNames[1], '[$\\[]', sep=""), VarNames)] # want "alpha"
> VarNames[grep(paste('^',TryNames[2], '[$\\[]', sep=""), VarNames)] # Gives waht I want
[1] "beta[1]" "beta[2]"
> VarNames[grep(paste('^',TryNames[3], '[$\\[]', sep=""), VarNames)] # want "m"
> VarNames[grep(paste('^',TryNames[3], sep=""), VarNames)] # gives more than "m"
[1] "m"    "mu.k" "mu.r"

Is it possible to do this, or will I have to resort to using '|'
(which works but is ugly & will only get uglier in the future)?


