[R] How to convert European short dates to ISO format?
Richard O'Keefe
r@oknz @end|ng |rom gm@||@com
Thu Jun 11 11:31:58 CEST 2020
I would add to this that in an important data set I was working with,
most of the dates were dd/mm/yy but some of them were mm/dd/yy and
that led to the realisation that I couldn't *tell* for about 40% of
the dates which they were. If they were all one or the other, no
worries, but when you have people from mixed backgrounds writing in
mixed formats, you have a problem.
On Thu, 11 Jun 2020 at 19:17, Martin Maechler <maechler using stat.math.ethz.ch>
wrote:
> >>>>> Rich Shepard
> >>>>> on Wed, 10 Jun 2020 07:44:49 -0700 writes:
>
> > On Wed, 10 Jun 2020, Jeff Newmiller wrote:
> >> Fix your format specification? ?strptime
>
> >>> I have been trying to convert European short dates
> >>> formatted as dd/mm/yy into the ISO 8601 but the function
> >>> as.Dates interprets them as American ones (mm/dd/yy),
> >>> thus I get:
>
> > Look at Hadley Wickham's 'tidyverse' collection as
> > described in R for Data Science. There are date, datetime,
> > and time functions that will do just what you want.
>
> > Rich
>
> I strongly disagree that automatic guessing of date format is a
> good idea:
>
> If you have dates such as 01/02/03, 10/11/12 , ...
> you cannot have a software (and also not a human) to *guess* for
> you what it means. You have to *know* or get that knowledge "exogenously",
> i.e., from context (say "meta data" if you want) that you as
> data analyst must have before you can reliably work with that
> data.
>
> There is a global standard (ISO) for dates, 2020-06-11, for today's;
> These have the huge advantage that alphabetical ordering is
> equivalent to time ordering ... and honestly I don't see why
> smart people (such as most? R users) do not all use these much
> more often, notably when it comes to data.
>
> But as long as most people in the world don't use that format
> and practically all default formats for dates (e.g. in
> spreadsheats and computer locales) do not use the ISO
> standard, but rather regional conventions, one must add meta
> data to have 100% garantee to use the correct format.
>
> Of course, you can often guess correctly with very high
> (subjective) probability, e.g., 11/23/99 is highly probably
> the 23rd of Nov, 1999.... and indeed if you have more than a few
> dates, it often helps to guess correctly. But there's no
> guarantee.
>
> No, I state that it is much better to ask from the data analyst
> to use their brains a little bit and enter the date format
> explicitly, than using software that does guess it for them
> correctly most of the time. How should they find out at all in
> the rare cases the automatic guess will be wrong ?
>
> Martin Maechler
> ETH Zurich and R Core team
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
[[alternative HTML version deleted]]
More information about the R-help
mailing list