[R] Unexpected date format coercion
Enrico Schumann
e@ @end|ng |rom enr|co@chum@nn@net
Thu Jul 1 12:46:08 CEST 2021
On Thu, 01 Jul 2021, Jeremie Juste writes:
> Hello
>
> On Thursday, 1 Jul 2021 at 08:25, PIKAL Petr wrote:
>> Hm.
>>
>> Seems to me, that both your codes are wrong but printing in Linux is
>> different from Windows.
>>
>> With
>> as.Date("20-12-2020","%Y-%m-%d")
>> you say that 20 is year (actually year 20) and 2020 is day and only first
>> two values are taken (but with some valueas result is NA)
>>
>> I can confirm 4.0.3 in Windows behaves this way too.
>>> as.Date("20-12-2020","%Y-%m-%d")
>> [1] "0020-12-20"
>
> Many thanks for confirming this.
>
>
> On Thursday, 1 Jul 2021 at 18:22, Jim Lemon wrote:
>> Hi Jeremie,
>> Try:
>>
>> as.Date("20-12-2020","%y-%m-%d")
>> [1] "2020-12-20"
>
> Thanks for this info. I'm looking for something that produce NA if the
> date is not exactly in the specified format so that it can be
> corrected. I was relying on the format parameter of the date for that.
>
> The issue is that there can be so many variations in date format that for the time
> being I still find it easier to delegate the correction to the user. A
> particular nasty case is when there are multiple date format in the same
> column.
>
>
> Best regards,
> Jeremie
>
You could explicitly test whether the specified format
is as expcected, perhaps with a regex such as
s <- c("2020-01-20", "20-12-2020")
grepl("^[0-9][0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9]$", s)
and/or by checking the components of the dates:
valid_Date <- function(s) {
tmp <- strsplit(s, "[-]")
year <- as.numeric(sapply(tmp, `[[`, 1))
valid.year <- year < 2500 & year > 1800
month <- as.numeric(sapply(tmp, `[[`, 2))
valid.month <- month >= 0 & month <= 12
day <- as.numeric(sapply(tmp, `[[`, 3))
valid.day <- day >= 1 & day <= 31
ans <- as.Date(s)
ans[!(valid.year & valid.month & valid.day)] <- NA
ans
}
--
Enrico Schumann
Lucerne, Switzerland
http://enricoschumann.net
More information about the R-help
mailing list