[R] Retain last grouping after a strsplit()

David Winsemius dwinsemius at comcast.net
Tue Dec 11 19:37:03 CET 2012


On Dec 11, 2012, at 10:10 AM, jim holtman wrote:

> try this:
>
>> x
> [1] "OYS-PIA2-FL-1"  "OYS-PIA2-LA-1"  "OYS-PI-LA-BB-1" "OYS-PIA2- 
> LA-10"
>> sub("^.*?([0-9]+)$", "\\1", x)
> [1] "1"  "1"  "1"  "10"
>>
>
>

Steve;

jim holtman is one of the jewels of the rhelp world. I generally  
assume that his answers are going to be the most succinct and  
efficient ones possible and avoid adding noise, but here I thought I  
would try to improve. Thinking there might be a string-splitting  
approach I first tried (and discovered a not-so-great solution:

  x <- c("OYS-PIA2-FL-1",  "OYS-PIA2-LA-1",  "OYS-PI-LA-BB-1", "OYS- 
PIA2-LA-10")
  sapply( strsplit(x, "-") , "[", 4)
[1] "1"  "1"  "BB" "10"

So then I asked myself if we could just "blank out" everything before  
the last das, finding what seemed to be a fairly economical solution  
and one that does not require back-references:

  sub( "^.+-" , "", x)
[1] "1"  "1"  "1"  "10"

If there were no digits after the last dash these approaches give  
different results:

  x <- c("OYS-PIA2-FL-1",  "OYS-PIA2-LA-1",  "OYS-PI-LA-BB-1", "OYS- 
PIA2-LA-")

  sub( "^.+-" , "", x)
[1] "1" "1" "1" ""

  sub("^.*?([0-9]+)$", "\\1", x)
[1] "1"            "1"            "1"            "OYS-PIA2-LA-"

When a grep pattern does not match, sub and gsub will return the whole  
argument.

-- 
David.

>
> On Tue, Dec 11, 2012 at 12:46 PM, Steven Ranney <steven.ranney at gmail.com 
> > wrote:
>> OYS-PIA2-FL-1
>> OYS-PIA2-LA-1
>> OYS-PI-LA-BB-1
>> OYS-PIA2-LA-10
>
>
>
> -- 
> Jim Holtman
> Data Munger Guru
>
> What is the problem that you are trying to solve?
> Tell me what you want to do, not how you want to do it.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
Alameda, CA, USA




More information about the R-help mailing list