[R] duplicated.data.frame() and POSIXct with DST shift
David Winsemius
dwinsemius at comcast.net
Fri Dec 14 02:01:56 CET 2012
On Dec 13, 2012, at 1:43 PM, Tobias Gauster wrote:
> Hi,
>
> I encountered the behavior, that the duplicated method for
> data.frames gives "false positives" if there are columns of class
> POSIXct with a clock shift from DST to standard time.
>
> time <- as.POSIXct("2012-10-28 02:00", tz="Europe/Vienna") + c(0,
> 60*60)
> time
> [1] "2012-10-28 02:00:00 CEST" "2012-10-28 02:00:00 CET"
>
> df <- data.frame(time, text="foo")
> duplicated(df)
> [1] FALSE TRUE
In this instance
>
> This is because the timezone is lost after calling paste():
> do.call(paste, c(df, sep = "\r"))
I suspect the problem arise when 'paste' coerces to character:
> as.character(time)
[1] "2012-10-28 02:00:00" "2012-10-28 02:00:00"
I think that as.character might get missed since the 'paste' operation
is done internally.
> as.character(time, usetz=TRUE)
[1] "2012-10-28 02:00:00 CEST" "2012-10-28 02:00:00 CET"
--
David.
[1] "2012-10-28 02:00:00\rfoo" "2012-10-28 02:00:00\rfoo"
>
>
> I can't really figure out if this behavior is desired or not. If so,
> a short warning in ?duplicated could be helpful. It is mentioned how
> duplicated.data.frame() works, but I didn't find a hint to properly
> handle POSIXct-objects.
There is no duplicated.POSIXct method
>
> My particular problem was to cast a data.frame like this one with
> cast() (which calls reshape1(), which calls duplicated()):
>
> df2 <- data.frame(time, time1=as.numeric(time),
> lab=rep(1:3, each=2), value=101:106,
> text=rep(c("foo", "bar"), each=3))
>
> library(reshape2)
>
> Using the column of class POSIXct as a variable in the formula gives:
> cast(lab*time~text, data=df2, value="value")
> Aggregation requires fun.aggregate: length used as default
> lab time bar foo
> 1 1 2012-10-28 02:00:00 0 2
> 2 2 2012-10-28 02:00:00 1 1
> 3 3 2012-10-28 02:00:00 2 0
>
> Converting to numeric, casting and converting back works as
> expected, although the timezone is not visible, because
> print.data.frame() calls format.POSIXct() with, usetz = FALSE:
> y <- cast(lab*time1~text, data=df2, value="value")
> y$time1 <- as.POSIXct("1970-01-01 01:00") + as.numeric(y$time1)
>
> Can anyone suggest a more elegant solution?
>
> Best,
> Tobias
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
David Winsemius, MD
Alameda, CA, USA
More information about the R-help
mailing list