[R] Undesired result
Duncan Murdoch
murdoch@dunc@n @end|ng |rom gm@||@com
Wed Feb 17 18:50:01 CET 2021
On 17/02/2021 9:50 a.m., Val wrote:
> HI All,
>
> I am reading a data file which has different date formats. I wanted to
> standardize to one format and used a library anytime but got
> undesired results as shown below. It gave me year 2093 instead of 1993
>
>
> library(anytime)
> DFX<-read.table(text="name ddate
> A 19-10-02
> D 11/19/2006
> F 9/9/2011
> G1 12/29/2010
> AA 10/18/93 ",header=TRUE)
> getFormats()
> addFormats(c("%d-%m-%y"))
> addFormats(c("%m-%d-%y"))
> addFormats(c("%Y/%d/%m"))
> addFormats(c("%m/%d/%y"))
>
> DFX$anew=anydate(DFX$ddate)
>
> Output
> name ddate anew
> 1 A 19-10-02 2002-10-19
> 2 D 11/19/2006 2020-11-19
> 3 F 9/9/2011 2011-09-09
> 4 G1 12/29/2010 2020-12-29
> 5 AA 10/18/93 2093-10-18
>
> The problem is in the last row. It should be 1993-10-18 instead of 2093-10-18
>
> How do I correct this?
This looks a little tricky. The basic idea is that the %y format has to
guess at the century, but the guess depends on things specific to your
system. So what would be nice is to say "two digit dates should be
assumed to fall between 1922 and 2021", but there's no way to do that
directly.
What you could do is recognize when you have a two digit year, and then
force the result into the range you want. Here's a function that does
that, but it's not really tested much at all, so be careful if you use
it. (One thing: I recommend the 'useR = TRUE' option to anydate(); it
worked better in my tests than the default.)
adjustCentury <- function(inputString,
outputDate = anydate(inputString, useR = TRUE),
start = "1922-01-01") {
start <- as.Date(start)
twodigityear <- !grepl("[[:digit:]]{4}", inputString)
while (length(bad <- which(twodigityear & outputDate < start))) {
for (i in bad) {
longdate <- as.POSIXlt(outputDate[i])
longdate$year <- longdate$year + 100
outputDate[i] <- as.Date(longdate)
}
}
longdate <- as.POSIXlt(start)
longdate$year <- longdate$year + 100
finish <- as.Date(longdate)
while (length(bad <- which(twodigityear & outputDate >= finish))) {
for (i in bad) {
longdate <- as.POSIXlt(outputDate[i])
longdate$year <- longdate$year - 100
outputDate[i] <- as.Date(longdate)
}
}
outputDate
}
library(anytime)
DFX<-read.table(text="name ddate
A 19-10-02
D 11/19/2006
F 9/9/2011
G1 12/29/2010
AA 10/18/93
BB 10/18/1893
CC 10/18/2093",header=TRUE)
addFormats(c("%d-%m-%y"))
addFormats(c("%m-%d-%y"))
addFormats(c("%Y/%d/%m"))
addFormats(c("%m/%d/%y"))
DFX$anew=adjustCentury(DFX$ddate, start = "1921-01-01")
DFX
#> name ddate anew
#> 1 A 19-10-02 2019-10-02
#> 2 D 11/19/2006 2006-11-19
#> 3 F 9/9/2011 2011-09-09
#> 4 G1 12/29/2010 2010-12-29
#> 5 AA 10/18/93 1993-10-18
#> 6 BB 10/18/1893 1893-10-18
#> 7 CC 10/18/2093 2093-10-18
More information about the R-help
mailing list