[R] Help with regular expressions.

Avi Gross @v|gro@@ @end|ng |rom ver|zon@net
Tue Feb 9 04:03:26 CET 2021


There are many ways, Rolf. You need to look into the syntax of regular
expressions. It depends on how sure you are that the formats are exactly as
needed. Escaping the period with one or more backslashes is one way. Using
string functions is another.

Suggestion. See if you can make a regular expression that is greedy and will
match everything up to period then a period then the rest and keep the first
and third parts and replace the middle with a minus sign. Or, match five
things. Everything up to a single period, the period, everything between,
the second period, and the rest, and keep the needed parts as above.

Periods and dashes must be used carefully though. A period means match one
of almost anything so a good way to catch it is [.] which matches a single
character of only a period. Put parens around that: "([.])" and you have a
replaceable item. In your case, you may want the parens around everything
else before and after, perhaps ([^.]*[.][^.]) then [.] then  ([^.]*) as one
long string and replace it with \1-\2 or some similar notation.

There are many other variation on this theme and some are simpler if the
exact format is consistent such as 'a' being a single character or the
string being a fixed length. If you are sure the period in "a.b.c" is always
the fourth character, no RE is needed. Use string methods. Even if not, you
can use string methods to search for a period from the end backwards or
search forward once to find the first and second time starting just past it.
Then replace. Fairly straightforward and very possibly much faster.
-----Original Message-----
From: R-help <r-help-bounces using r-project.org> On Behalf Of Rolf Turner
Sent: Monday, February 8, 2021 9:29 PM
To: "r-help using R-project.org" <r-help using R-project.org>"@r-project.org
Subject: [R] Help with regular expressions.


I want to deal with strings of the form "a.b.c" and to change (using
sub() or whatever is appropriate) the second "." to a "-", i.e. to change
"a.b.c" to "a.b-c".  I want to leave the first "." as-is.

I guess I could do a gsub(), changing all "."s to "-"s, and then do a sub()
changing the first "-" back to a ".".  But this seems very kludgy.  There
must be a sexier way.  Mustn't there?  Is there regular expression syntax
for picking out the second occurence of a particular string?

cheers,

Rolf Turner

--
Honorary Research Fellow
Department of Statistics
University of Auckland
Phone: +64-9-373-7599 ext. 88276

______________________________________________
R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list