[R] Retain last grouping after a strsplit()
David Winsemius
dwinsemius at comcast.net
Tue Dec 11 21:24:19 CET 2012
On Dec 11, 2012, at 11:14 AM, Steven Ranney wrote:
> David and Jim -
>
> Thanks for your help. Your suggestions worked just fine. Now my task
> is to learn why the random-looking string of characters in the first
> part of Jim's sub() statement aren't really so random.
>
Jim's solution can be read as:
Pattern matching phase:
continue along all the characters, ".*?" from the beginning "^" until you encounter any characters in the range "0" to "9" that are all together just before the end ("$"). Label or store those in-range characters as matched group numbered "\\1". The entire pattern will match the whole string.
Substitution phase:
Replace what is matched (the whole string in this case) with just the first numbered matched group, "\\1".
Notice that this could be thought of as a "positive replacement" in contrast to my solution and Gabor Grothendieck's later and slightly more compact version which could be called "negative replacements".
--
David
> Thanks again -
>
> SR
> Steven H. Ranney
>
>
> On Tue, Dec 11, 2012 at 11:37 AM, David Winsemius
> <dwinsemius at comcast.net> wrote:
>>
>> On Dec 11, 2012, at 10:10 AM, jim holtman wrote:
>>
>>> try this:
>>>
>>>> x
>>>
>>> [1] "OYS-PIA2-FL-1" "OYS-PIA2-LA-1" "OYS-PI-LA-BB-1" "OYS-PIA2-LA-10"
>>>>
>>>> sub("^.*?([0-9]+)$", "\\1", x)
>>>
>>> [1] "1" "1" "1" "10"
>>>>
>>>>
>>>
>>>
>>
>> Steve;
>>
>> jim holtman is one of the jewels of the rhelp world. I generally assume that
>> his answers are going to be the most succinct and efficient ones possible
>> and avoid adding noise, but here I thought I would try to improve. Thinking
>> there might be a string-splitting approach I first tried (and discovered a
>> not-so-great solution:
>>
>> x <- c("OYS-PIA2-FL-1", "OYS-PIA2-LA-1", "OYS-PI-LA-BB-1",
>> "OYS-PIA2-LA-10")
>> sapply( strsplit(x, "-") , "[", 4)
>> [1] "1" "1" "BB" "10"
>>
>> So then I asked myself if we could just "blank out" everything before the
>> last das, finding what seemed to be a fairly economical solution and one
>> that does not require back-references:
>>
>> sub( "^.+-" , "", x)
>>
>> [1] "1" "1" "1" "10"
>>
>> If there were no digits after the last dash these approaches give different
>> results:
>>
>> x <- c("OYS-PIA2-FL-1", "OYS-PIA2-LA-1", "OYS-PI-LA-BB-1",
>> "OYS-PIA2-LA-")
>>
>> sub( "^.+-" , "", x)
>>
>> [1] "1" "1" "1" ""
>>
>> sub("^.*?([0-9]+)$", "\\1", x)
>> [1] "1" "1" "1" "OYS-PIA2-LA-"
>>
>> When a grep pattern does not match, sub and gsub will return the whole
>> argument.
>>
>> --
>> David.
>>
>>>
>>> On Tue, Dec 11, 2012 at 12:46 PM, Steven Ranney <steven.ranney at gmail.com>
>>> wrote:
>>>>
>>>> OYS-PIA2-FL-1
>>>> OYS-PIA2-LA-1
>>>> OYS-PI-LA-BB-1
>>>> OYS-PIA2-LA-10
>>>
>>>
>>>
>>>
>>> --
>>> Jim Holtman
>>> Data Munger Guru
>>>
>>> What is the problem that you are trying to solve?
>>> Tell me what you want to do, not how you want to do it.
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>> David Winsemius, MD
>> Alameda, CA, USA
>>
David Winsemius
Alameda, CA, USA
More information about the R-help
mailing list