[R] duplicated.data.frame() and POSIXct with DST shift
David Winsemius
dwinsemius at comcast.net
Fri Dec 14 04:07:42 CET 2012
On Dec 13, 2012, at 5:01 PM, David Winsemius wrote:
>
> On Dec 13, 2012, at 1:43 PM, Tobias Gauster wrote:
>
>> Hi,
>>
>> I encountered the behavior, that the duplicated method for data.frames gives "false positives" if there are columns of class POSIXct with a clock shift from DST to standard time.
>>
>> time <- as.POSIXct("2012-10-28 02:00", tz="Europe/Vienna") + c(0, 60*60)
>> time
>> [1] "2012-10-28 02:00:00 CEST" "2012-10-28 02:00:00 CET"
>>
>> df <- data.frame(time, text="foo")
>> duplicated(df)
>> [1] FALSE TRUE
>
> In this instance
>>
>> This is because the timezone is lost after calling paste():
>> do.call(paste, c(df, sep = "\r"))
>
> I suspect the problem arise when 'paste' coerces to character:
>
> > as.character(time)
> [1] "2012-10-28 02:00:00" "2012-10-28 02:00:00"
>
> I think that as.character might get missed since the 'paste' operation is done internally.
>
> > as.character(time, usetz=TRUE)
> [1] "2012-10-28 02:00:00 CEST" "2012-10-28 02:00:00 CET"
This would work as intended if you pre-processed the argument to duplicated with:
> data.frame(lapply(df, as.character, usetz=TRUE) )
time text
1 2012-10-28 02:00:00 CEST foo
2 2012-10-28 02:00:00 CET foo
> duplicated( data.frame(lapply(df, as.character, usetz=TRUE) ) )
[1] FALSE FALSE
>
>
> --
> David.
>
>
> [1] "2012-10-28 02:00:00\rfoo" "2012-10-28 02:00:00\rfoo"
>>
>>
>
>> I can't really figure out if this behavior is desired or not. If so, a short warning in ?duplicated could be helpful. It is mentioned how duplicated.data.frame() works, but I didn't find a hint to properly handle POSIXct-objects.
>
> There is no duplicated.POSIXct method
>>
>> My particular problem was to cast a data.frame like this one with cast() (which calls reshape1(), which calls duplicated()):
>>
>> df2 <- data.frame(time, time1=as.numeric(time),
>> lab=rep(1:3, each=2), value=101:106,
>> text=rep(c("foo", "bar"), each=3))
>>
>> library(reshape2)
>>
>> Using the column of class POSIXct as a variable in the formula gives:
>> cast(lab*time~text, data=df2, value="value")
>> Aggregation requires fun.aggregate: length used as default
>> lab time bar foo
>> 1 1 2012-10-28 02:00:00 0 2
>> 2 2 2012-10-28 02:00:00 1 1
>> 3 3 2012-10-28 02:00:00 2 0
>>
>> Converting to numeric, casting and converting back works as expected, although the timezone is not visible, because print.data.frame() calls format.POSIXct() with, usetz = FALSE:
>> y <- cast(lab*time1~text, data=df2, value="value")
>> y$time1 <- as.POSIXct("1970-01-01 01:00") + as.numeric(y$time1)
>>
>> Can anyone suggest a more elegant solution?
>>
>> Best,
>> Tobias
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> David Winsemius, MD
> Alameda, CA, USA
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
David Winsemius
Alameda, CA, USA
More information about the R-help
mailing list