[Rd] Bug with `[<-.POSIXlt` on specific OSes
Suharto Anggono Suharto Anggono
@uh@rto_@nggono @end|ng |rom y@hoo@com
Sun Oct 30 11:51:54 CET 2022
I just pointed out that, in https://stat.ethz.ch/pipermail/r-devel/2022-October/082082.html ("A potential POSIXlt->Date bug introduced in r-devel"),
dlt <- .POSIXlt(list(sec = c(-999, 10000 + c(1:10,-Inf, NA)) + pi,
# "out of range", non-finite, fractions
min = 45L, hour = c(21L, 3L, NA, 4L),
mday = 6L, mon = c(11L, NA, 3L),
year = 116L, wday = 2L, yday = 340L, isdst = 1L))
doesn't work generally as an example.
When as.POSIXct(dlt)[1] is NA, it is unexpected to me that as.POSIXct(balancePOSIXlt(dlt))[1] is not NA.
It happens because, unlike 'dlt', 'isdst' is 0 in balancePOSIXlt(dlt). It is because 'isGMT' is TRUE in 'do_balancePOSIXlt' in datetime.c, as the number of components of 'dlt' is 9.
If content is changed, possible output of 'balancePOSIXlt' that I expect:
Option 1: companion of 'as.POSIXct.POSIXlt' applied to the same input, as with function 'mktime' in C
- The input "POSIXlt" object is like the initial struct tm whose pointer is presented to 'mktime'.
- The result of 'as.POSIXct.POSIXlt' is like the return value of 'mktime'.
- The result of 'balancePOSIXlt' is like the final struct tm after 'mktime' is applied.
Option 2: corresponding with 'format.POSIXlt' applied to the same input
'format.POSIXlt' doesn't fix 'wday' or 'yday'.
format(dlt, "%Y-%m-%d %w %j")[c(6, 9)]
# c("2016-04-06 2 341", "2016-04-07 2 341")
Side issues on 'format.POSIXlt':
- %OSn uses unnormalized 'sec', unlike %S.
format(dlt, "%S %OS3")[1] # "24 -995.858"
-
format(dlt, "%A")[12] # "-Inf"
It is rather strange to me to get "-Inf" from format %A. I expect to get weekday name. NA is acceptable.
Function 'weekdays' use it.
The reported issue remains.
x <- as.POSIXlt(as.POSIXct("2013-01-31", tz = "America/Chicago"))
Sys.setenv(TZ = "UTC")
x[1] <- NA
# Error in x[[n]][i] <- value[[n]] : replacement has length zero
---------------------------
On Saturday, 22 October 2022, 07:12:51 pm GMT+7, Martin Maechler <maechler using stat.math.ethz.ch> wrote:
>>>>> Martin Maechler
>>>>> on Tue, 18 Oct 2022 10:56:25 +0200 writes:
>>>>> Suharto Anggono Suharto Anggono via R-devel
>>>>> on Fri, 14 Oct 2022 16:21:14 +0000 (UTC) writes:
>> I think '[.POSIXlt' and '[<-.POSIXlt' don't need to
>> normalize out-of-range values. I think they just make
>> same length for all components, to ensure correct
>> extraction or replacement for arbitrary index.
> Yes, you are right; this is definitely correct... and
> would be more efficient.
> At the moment, we were mostly focused on *correct*
> behaviour in the case of "ragged" and/or out-of-range
> POSIXlt objects.
>> I have a thought of adding an optional argument for
>> 'as.POSIXlt' applied to "POSIXlt" object. Possible name:
>> normalize adjust fixup
>> To allow recycling only without changing content, instead
>> of TRUE or FALSE, maybe choice, like fixup = c("none",
>> "balance", "normalize") , where "normalize" implies
>> "balance", or adjust = c("none", "length", "content",
>> "value") , where "content" and "value" are synonymous.
> Such an optional argument for as.POSIXlt() would be a
> possibility and could replace the new and for now still
> somewhat experimental balancePOSIXlt().
> +: One advantage of (one of the above proposals) would
> be that it does not take up a new function name.
> -: OTOH, it may be overdoing the semantics
> as.POSIXlt(<POSIXlt>, <some> = <other>)
> and it may be harder to understand by
> non-sophisticated R users, because as.POSIXlt() is a
> generic with several methods, and these extra arguments
> would probably only apply to the as.POSIXlt.default()
> method and there *only* for the case where the argument
> inherits from "POSIXlt" .. and all that being somewhat
> subtle to see for Joe Average UseR
> I agree that it will make sense to get an R-level
> version, either using new arguments in as.POSIXlt() or
> (still my preference) in balancePOSIXlt() to allow to
> "only fill all components".
> HOWEVER note that the "filling" (by recycling) and no
> extra checking will often lead to internally
> inconsistent lt objects. Eg. Daylight saving time
> (isdst = 1 or not) can only be known when the day (and
> hour) is known and that can be shifted by out-of-range
> sec/min/hour .. ((and of course for 1 hour per year, a
> time hour=2 will *need* specification of isdst in order
> to know which of the 2:<min>:<sec> is meant)) also $wday
> and $yday (who are described as read-only) also can only
> be checked after validation or "in-ranging" of the
> sec/min/hour/mday/mon components so their simple
> recycling will typically be incorrect.
> That's why I had opted to *mainly* do full "balancing"
> (in my sense), i.e., simultaneous both filling and
> "in-ranging".
A few hours ago [R-devel svn rev 83156; 2022-10-22 10:18:38 +0200]
I have committed an enhanced version of balancePOSIXlt() which
now has an optional 'fill.only = F/T' rgument.
When TRUE (not by default), it will only do the "filling", i.e.,
recyclying of less-than-full-length components, without any
"in-ranging" nor musch further validity checking.
Currently, almost all POSIXlt methods using balancePOSIXlt(),
notably
[.POSIXlt and [<-.POSIXlt
use balancePOSIXlt(x, fill.only=TRUE ..)
and hence are almost as fast as previously (when they did no
balancing and gave sometimes wrong results or errored in case of
partially filled POSIXlt).
>> By the way, Inf in 'sec' component is out-of-range!
> Yes, the non-finite "values" {+/-Inf, NaN, NA} are all
> "special", and we had decided to allow them for
> compatibility with classes "Date" and "POSIXct".
> BTW, a few days ago, I have updated the
> help("DateTimeClasses") page in R-devel to document a
> bit more, notably that "ragged" and out-of-range POSIXlt
> may exist... see (the always +- current R-devel Help
> pages at)
> https://stat.ethz.ch/R-manual/R-devel/library/base/html/DateTimeClasses.html
>> For 'gmtoff', NA or 0 should be put for unknown. A known
>> 'gmtoff' may be [ositive, negative, or zero. The
>> documentation says ‘gmtoff’ (Optional.) The offset in
>> seconds from GMT: positive values are East of the
>> meridian. Usually ‘NA’ if unknown, but ‘0’ could mean
>> unknown.
>> dlt <- .POSIXlt(list(sec = c(-999, 10000 + c(1:10,-Inf,
>> NA)) + pi, # "out of range", non-finite, fractions min =
>> 45L, hour = c(21L, 3L, NA, 4L), mday = 6L, mon = c(11L,
>> NA, 3L), year = 116L, wday = 2L, yday = 340L, isdst =
>> 1L))
>> as.POSIXct(dlt)[1] is NA on Linux with timezone without
>> DST. For example, after Sys.setenv(TZ = "EST")
> Hmm... I needed time to look at the above. Indeed, one
> gets NA (and has in previous versions of R) in such a
> case.
> After applying balancePOSIXlt(), one no longer gets NA.
> Are you proposing that we should do that (or possibly
> simple recycling) in as.POSIXct.POSIXlt() ?
I am still waiting for comments (also by others) or other
remarks or answers on this question/topic..
Martin
More information about the R-devel
mailing list