[R] Extracting text from a character string
    Marc Schwartz 
    marc_schwartz at comcast.net
       
    Fri Mar  9 21:44:47 CET 2007
    
    
  
On Fri, 2007-03-09 at 15:23 -0500, Shawn Way wrote:
>  I have a set of character strings like below:
>  
> > data3[1]
> [1] "CB01_0171_03-27-2002-(Sample 26609)-(126)"
> > 
>  
> I am trying to extract the text 03-27-2002 and convert this into a date 
> for the same record.  I keep looking at the grep function, however I 
> cannot quite get it to work.
>  
> grep("\d\d-\d\d-\d\d\d\d",data3[1],perl=TRUE,value=TRUE)
>  
> Any hints?
At least two different ways:
Vec <- "CB01_0171_03-27-2002-(Sample 26609)-(126)"
1. Using substr(), if your source vector is a fixed format
# Get the 11th thru the 20th character
> substr(Vec, 11, 20)
[1] "03-27-2002"
2. Using sub() for a more generalized approach:
# Use a back reference, returning the value pattern within the 
# parens
> sub(".+([0-9]{2}-[0-9]{2}-[0-9]{4}).+", "\\1", Vec)
[1] "03-27-2002"
See ?substr, ?sub and ?regex
HTH,
Marc Schwartz
    
    
More information about the R-help
mailing list