[R] regexp help needed

Gabor Grothendieck ggrothendieck at gmail.com
Fri Nov 28 13:40:32 CET 2008


On Fri, Nov 28, 2008 at 5:51 AM, Peter Dalgaard
<P.Dalgaard at biostat.ku.dk> wrote:
> Lauri Nikkinen wrote:
>> Hello,
>>
>> I have a vector of dates and I would like to grep the year component
>> from this vector (= all digits
>> after the last punctuation character)
>>
>> dates <- c("28.7.08","28.7.2008","28/7/08", "28/7/2008", "28/07/2008",
>> "28-07-2008", "28-07-08")
>>
>> the resulting vector should look like
>>
>> "08" "2008" "08" "2008" "2008" "2008" "08"
>>
>> I tried something like (Perl style) with no success
>>
>> grep("[[:punct:]]?\\d", dates, value=T, perl=T)
>>
>> Any ideas?
>
>> sub(".*[[:punct:]]([0-9]*$)", "\\1", dates)
> [1] "08"   "2008" "08"   "2008" "2008" "2008" "08"
>> sub(".*[[:punct:]](.*)$", "\\1", dates)
> [1] "08"   "2008" "08"   "2008" "2008" "2008" "08"
>> sub(".*[[:punct:]]", "", dates)
> [1] "08"   "2008" "08"   "2008" "2008" "2008" "08"
>> substring(dates,regexpr("[0-9]*$", dates))
> [1] "08"   "2008" "08"   "2008" "2008" "2008" "08"
>

Here are a one more.  This uses strapply from gsubfn
which returns the matches directly.  The simplify = c
causes it to return them as a character vector instead
of a list:

library(gsubfn)
strapply(dates, "[0-9]+$", simplify = c)



More information about the R-help mailing list