[R] Using regex to truncate repeating characters
Marc Schwartz
marc_schwartz at me.com
Wed Nov 11 16:06:50 CET 2015
> On Nov 11, 2015, at 3:02 AM, Karl <josip.2000 at gmail.com> wrote:
>
> Hi all,
>
> I'm trying to learn how to use regex inside R. I'm far from an expert when
> it comes to this, but google is my friend when it comes to finding suitable
> pieces of syntax to start building from. For example, this post seems to do
> what I want:
>
> http://stackoverflow.com/questions/12258622/regular-expression-to-check-for-repeating-characters
> However, how do I implement this in R? gsub()?
> For example, with Perl-style regex, are there syntax modifications that
> need to be done before it will work with R?
>
> My task is that I want to truncate/limit repeated characters to 3. If I
> have the string:
> "Looooorem ipsum dolor sit ammmmmmet, consectetur adipiscing eliiiiiiiit"
>
> I want to truncate it to:
> "Looorem ipsum dolor sit ammmet, consectetur adipiscing eliiit"
>
> Thank you!
>
> BR,
> Josip
Hi,
Not extensively tested, but something like this should work:
text <- "Looooorem ipsum dolor sit ammmmmmet, consectetur adipiscing eliiiiiiiit"
> gsub("([[:alnum:]])\\1{3,}", "\\1\\1\\1", text)
[1] "Looorem ipsum dolor sit ammmet, consectetur adipiscing eliiit"
The regex is looking for any alphanumeric character as a group, which is represented by:
([[:alnum:]])
That is followed by a backreference:
\\1{3,}
which says find repeated characters in the prior alphanumeric character group of at least 3 repeats and return just the unique character.
The returned expression:
\\1\\1\\1
says repeat the unique character 3 times.
See ?gsub and ?regex for some additional information.
Regards,
Marc Schwartz
More information about the R-help
mailing list