[R] Confusion with Converting Factors to Dates using as.date

Marc Schwartz marc_schwartz at comcast.net
Wed Dec 10 22:25:46 CET 2008


on 12/10/2008 02:41 PM Josip Dasovic wrote:
> Dear R-Helpers:
> 
> I'm having a problem getting dates into the correct format. I have a
> data frame, which is based on a .csv file that I imported into R via
> read.table.
> 
> R has converted my date variables to factors; when I use the as.Date
> command, most of the values are converted "correctly" (and by this I
> guess I mean converted "as I wish them to be") but some have not
> been.
> 
> Here's what I have: str(pk.df)
> 
> 'data.frame':	206 obs. of  134 variables: $ uniqid         : int  010
> 015 120 130 210 245 320 330 415 ... $ st_date     : Factor w/ 154
> levels "01/01/48","01/01/51",..: 46 27 NA 12 118 NA 63 127 NA NA ... 
> ...
> 
> I then convert them to a date class using
> 
> st_date.new<-as.Date(st_date, "%m/%d/%y")
> 
> This _seems_ to work...
> 
> str(st_date.new) Class 'Date'  num [1:206]  8150  8466    NA 33982
> 10149 ...
> 
> But notice the 4th observation; I would like it to be 1963, not 2063.
> 
> 
> st_date.new[1:10] [1] "1992-04-25" "1993-03-07" NA
> "2063-01-15" "1997-10-15" [6] NA           "1991-05-31" "1994-11-20"
> NA           NA
> 
> st_date[1:10] [1] 04/25/92 03/07/93 <NA>     01/15/63 10/15/97 <NA>
> 05/31/91 [8] 11/20/94 <NA>     <NA> 154 Levels: 01/01/48 01/01/51
> 01/01/52 01/01/59 01/01/63 ... 12/31/96
> 
> 
> I thought that the problem might be that I was converting a factor,
> so I first converted the variable to a character type (although I
> understand that this is done automatically) and then to date class,
> but I still had the same problem. Does anybody know how I can solve
> this and why I am getting this behavior? One more tidbit: the
> earliest date for which the date conversion is "correct" is
> 1969-04-15, while the most recent date for which the century is
> "incorrect" is 1967-11-05.
> 
> Thanks, Josip

This is the consequence of using a two digit year rather than a four
digit year, which BTW, was one of the Y2K issues raised a decade ago...

As per ?strptime:

%y
    Year without century (00–99). If you use this on input, which
century you get is system-specific. So don't! Often values up to 68 (or
69) are prefixed by 20 and 69 (or 70) to 99 by 19.



If you know that all of your dates are going to be before 2000, you can
do the following, by using a regex to convert the two digit year to a
four digit year and then use as.Date() with '%Y':

st_date <- "01/15/63"

> sub("([0-9]{2})$", "19\\1", st_date)
[1] "01/15/1963"

> as.Date(sub("([0-9]{2})$", "19\\1", st_date), format = "%m/%d/%Y")
[1] "1963-01-15"


The better option is to ensure that the source of your data outputs or
exports dates with a four digit year, before importing into R.

See ?sub and ?regex

HTH,

Marc Schwartz



More information about the R-help mailing list