[Rd] Bug with `[<-.POSIXlt` on specific OSes

Martin Maechler m@ech|er @end|ng |rom @t@t@m@th@ethz@ch
Sat Oct 22 14:12:47 CEST 2022


>>>>> Martin Maechler 
>>>>>     on Tue, 18 Oct 2022 10:56:25 +0200 writes:

>>>>> Suharto Anggono Suharto Anggono via R-devel 
>>>>>     on Fri, 14 Oct 2022 16:21:14 +0000 (UTC) writes:

    >> I think '[.POSIXlt' and '[<-.POSIXlt' don't need to
    >> normalize out-of-range values. I think they just make
    >> same length for all components, to ensure correct
    >> extraction or replacement for arbitrary index.

    > Yes, you are right; this is definitely correct...  and
    > would be more efficient.

    > At the moment, we were mostly focused on *correct*
    > behaviour in the case of "ragged" and/or out-of-range
    > POSIXlt objects.


    >> I have a thought of adding an optional argument for
    >> 'as.POSIXlt' applied to "POSIXlt" object. Possible name:
    >> normalize adjust fixup

    >> To allow recycling only without changing content, instead
    >> of TRUE or FALSE, maybe choice, like fixup = c("none",
    >> "balance", "normalize") , where "normalize" implies
    >> "balance", or adjust = c("none", "length", "content",
    >> "value") , where "content" and "value" are synonymous.

    > Such an optional argument for as.POSIXlt() would be a
    > possibility and could replace the new and for now still
    > somewhat experimental balancePOSIXlt().

    > +: One advantage of (one of the above proposals) would
    > be that it does not take up a new function name.

    > -: OTOH, it may be overdoing the semantics

    >      as.POSIXlt(<POSIXlt>, <some> = <other>)

    >   and it may be harder to understand by
    > non-sophisticated R users, because as.POSIXlt() is a
    > generic with several methods, and these extra arguments
    > would probably only apply to the as.POSIXlt.default()
    > method and there *only* for the case where the argument
    > inherits from "POSIXlt" .. and all that being somewhat
    > subtle to see for Joe Average UseR

    > I agree that it will make sense to get an R-level
    > version, either using new arguments in as.POSIXlt() or
    > (still my preference) in balancePOSIXlt() to allow to
    > "only fill all components".

    > HOWEVER note that the "filling" (by recycling) and no
    > extra checking will often lead to internally
    > inconsistent lt objects.  Eg. Daylight saving time
    > (isdst = 1 or not) can only be known when the day (and
    > hour) is known and that can be shifted by out-of-range
    > sec/min/hour .. ((and of course for 1 hour per year, a
    > time hour=2 will *need* specification of isdst in order
    > to know which of the 2:<min>:<sec> is meant)) also $wday
    > and $yday (who are described as read-only) also can only
    > be checked after validation or "in-ranging" of the
    > sec/min/hour/mday/mon components so their simple
    > recycling will typically be incorrect.

    > That's why I had opted to *mainly* do full "balancing"
    > (in my sense), i.e., simultaneous both filling and
    > "in-ranging".

A few hours ago [R-devel svn rev 83156; 2022-10-22 10:18:38 +0200]
I have committed an enhanced version of balancePOSIXlt()  which
now has an optional 'fill.only = F/T' rgument.
When TRUE (not by default), it will only do the "filling", i.e.,
recyclying of less-than-full-length components, without any
"in-ranging" nor musch further validity checking.

Currently, almost all POSIXlt methods using balancePOSIXlt(),
notably 
		[.POSIXlt    and    [<-.POSIXlt

use  balancePOSIXlt(x, fill.only=TRUE ..)
and hence are almost as fast as previously (when they did no
balancing and gave sometimes wrong results or errored in case of
partially filled POSIXlt).



    >> By the way, Inf in 'sec' component is out-of-range!

    > Yes, the non-finite "values" {+/-Inf, NaN, NA} are all
    > "special", and we had decided to allow them for
    > compatibility with classes "Date" and "POSIXct".

    > BTW, a few days ago, I have updated the
    > help("DateTimeClasses") page in R-devel to document a
    > bit more, notably that "ragged" and out-of-range POSIXlt
    > may exist...  see (the always +- current R-devel Help
    > pages at)
    > https://stat.ethz.ch/R-manual/R-devel/library/base/html/DateTimeClasses.html


    >> For 'gmtoff', NA or 0 should be put for unknown. A known
    >> 'gmtoff' may be [ositive, negative, or zero. The
    >> documentation says ‘gmtoff’ (Optional.) The offset in
    >> seconds from GMT: positive values are East of the
    >> meridian.  Usually ‘NA’ if unknown, but ‘0’ could mean
    >> unknown.


    >> dlt <- .POSIXlt(list(sec = c(-999, 10000 + c(1:10,-Inf,
    >> NA)) + pi, # "out of range", non-finite, fractions min =
    >> 45L, hour = c(21L, 3L, NA, 4L), mday = 6L, mon = c(11L,
    >> NA, 3L), year = 116L, wday = 2L, yday = 340L, isdst =
    >> 1L))

    >> as.POSIXct(dlt)[1] is NA on Linux with timezone without
    >> DST. For example, after Sys.setenv(TZ = "EST")

    > Hmm... I needed time to look at the above. Indeed, one
    > gets NA (and has in previous versions of R) in such a
    > case.

    > After applying balancePOSIXlt(), one no longer gets NA.
    > Are you proposing that we should do that (or possibly
    > simple recycling) in as.POSIXct.POSIXlt() ?

I am still waiting for comments (also by others) or other
remarks or answers on this question/topic..

Martin



More information about the R-devel mailing list