[Rd] R (development) changes in arith, logic, relop with (0-extent) arrays

Fri Sep 9 08:51:46 CEST 2016

Thank you, Gabe and Bill,

for taking up the discussion.

>>>>> William Dunlap <wdunlap at tibco.com>
>>>>>     on Thu, 8 Sep 2016 10:45:07 -0700 writes:

    > Prior to the mid-1990s, S did "length-0 OP length-n -> rep(NA, n)" and it
    > was changed
    > to "length-0 OP length-n -> length-0" to avoid lots of problems like
    > any(x<0) being NA
    > when length(x)==0.  Yes, people could code defensively by putting lots of
    > if(length(x)==0)...
    > in their code, but that is tedious and error-prone and creates really ugly
    > code.

Yes, so actually, basically

     length-0 OP <anything>  -> length-0

Now the case of NULL that Bill mentioned.
I agree that NULL  is not at all the same thing as  double(0) or logical(0),
*but* there have been quite a few cases, where NULL is the
result of operations where "for consistency"  double(0) / logical(0) should have
been.... and there are the users who use NULL as the equivalent
of those, e.g., by initializing a (to be grown, yes, very inefficient!)
vector with NULL instead of with say double(0).

For these reasons, many operations that expect a "number-like"
(includes logical) atomic vector have treated NULL as such...
*and* parts of the {arith/logic/relop} OPs have done so already
in R "forever".
I still would argue that for these OPs, treating NULL as  logical(0) {which
then may be promoted by the usual rules} is good thing.

    > Is your suggestion to leave the length-0 OP length-1 case as it is but make
    > length-0 OP length-two-or-higher an error or warning (akin to the length-2
    > OP length-3 case)?

That's exactly what one thing the current changes eliminated:
arithmetic (only; not logic, or relop) did treat the length-1
case (for arrays!) different from the length-GE-2 case.  And I strongly
believe that this is very wrong and counter to the predominant
recycling rules in (S and) R.

    > By the way, the all(numeric(0)<0) is TRUE, as is all(numeric()>0), by de
    > Morgan's rule, but that is not really relevant here.

    > Bill Dunlap
    > TIBCO Software
    > wdunlap tibco.com

    > On Thu, Sep 8, 2016 at 10:22 AM, Gabriel Becker <gmbecker at ucdavis.edu>
    > wrote:

    >> 
    >> 
    >> On Thu, Sep 8, 2016 at 10:05 AM, William Dunlap <wdunlap at tibco.com> wrote:
    >> 
    >>> Shouldn't binary operators (arithmetic and logical) should throw an error
    >>> when one operand is NULL (or other type that doesn't make sense)?  This is
    >>> a different case than a zero-length operand of a legitimate type.  E.g.,
    >>> any(x < 0)
    >>> should return FALSE if x is number-like and length(x)==0 but give an
    >>> error if x is NULL.
    >>> 
    >> Bill,
    >> 
    >> That is a good point. I can see the argument for this in the case that the
    >> non-zero length is 1. I'm not sure which is better though. If we switch
    >> any() to all(), things get murky.
    >> 
    >> Mathematically, all(x<0) is TRUE if x is length 0 (as are all(x==0), and
    >> all(x>0)), but the likelihood of this being a thought-bug on the author's
    >> part is exceedingly high, imho. So the desirable behavior seems to depend
    >> on the angle we look at it from.
    >> 
    >> My personal opinion is that x < y with length(x)==0 should fail if length(y)
    >> > 1, at least, and I'd be for it being an error even if y is length 1,
    >> though I do acknowledge this is more likely (though still quite unlikely
    >> imho) to be the intended behavior.
    >> 
    >> ~G
    >> 
    >>> 
    >>> I.e., I think the type check should be done before the length check.
    >>> 
    >>> 
    >>> Bill Dunlap
    >>> TIBCO Software
    >>> wdunlap tibco.com
    >>> 
    >>> On Thu, Sep 8, 2016 at 8:43 AM, Gabriel Becker <gmbecker at ucdavis.edu>
    >>> wrote:
    >>> 
    >>>> Martin,
    >>>> 
    >>>> Like Robin and Oliver I think this type of edge-case consistency is
    >>>> important and that it's fantastic that R-core - and you personally - are
    >>>> willing to tackle some of these "gotcha" behaviors. "Little" stuff like
    >>>> this really does combine to go a long way to making R better and better.
    >>>> 
    >>>> I do wonder a  bit about the
    >>>> 
    >>>> x = 1:2
    >>>> 
    >>>> y = NULL
    >>>> 
    >>>> x < y
    >>>> 
    >>>> case.
    >>>> 
    >>>> Returning a logical of length 0 is more backwards compatible, but is it
    >>>> ever what the author actually intended? I have trouble thinking of a case
    >>>> where that less-than didn't carry an implicit assumption that y was
    >>>> non-NULL.  I can say that in my own code, I've never hit that behavior
    >>>> in a
    >>>> case that wasn't an error.
    >>>> 
    >>>> My vote (unless someone else points out a compelling use for the
    >>>> behavior)
    >>>> is for the to throw an error. As a developer, I'd rather things like this
    >>>> break so the bug in my logic is visible, rather than  propagating as the
    >>>> 0-length logical is &'ed or |'ed with other logical vectors, or used to
    >>>> subset, or (in the case it should be length 1) passed to if() (if throws
    >>>> an
    >>>> error now, but the rest would silently "work").
    >>>> 
    >>>> Best,
    >>>> ~G
    >>>> 
    >>>> On Thu, Sep 8, 2016 at 3:49 AM, Martin Maechler <
    >>>> maechler at stat.math.ethz.ch>
    >>>> wrote:
    >>>> 
    >>>> > >>>>> robin hankin <hankin.robin at gmail.com>
    >>>> > >>>>>     on Thu, 8 Sep 2016 10:05:21 +1200 writes:
    >>>> >
    >>>> >     > Martin I'd like to make a comment; I think that R's
    >>>> >     > behaviour on 'edge' cases like this is an important thing
    >>>> >     > and it's great that you are working on it.
    >>>> >
    >>>> >     > I make heavy use of zero-extent arrays, chiefly because
    >>>> >     > the dimnames are an efficient and logical way to keep
    >>>> >     > track of certain types of information.
    >>>> >
    >>>> >     > If I have, for example,
    >>>> >
    >>>> >     > a <- array(0,c(2,0,2))
    >>>> >     > dimnames(a) <- list(name=c('Mike','Kevin'),
    >>>> > NULL,item=c("hat","scarf"))
    >>>> >
    >>>> >
    >>>> >     > Then in R-3.3.1, 70800 I get
    >>>> >
    >>>> >     a> 0
    >>>> >     > logical(0)
    >>>> >     >>
    >>>> >
    >>>> >     > But in 71219 I get
    >>>> >
    >>>> >     a> 0
    >>>> >     > , , item = hat
    >>>> >
    >>>> >
    >>>> >     > name
    >>>> >     > Mike
    >>>> >     > Kevin
    >>>> >
    >>>> >     > , , item = scarf
    >>>> >
    >>>> >
    >>>> >     > name
    >>>> >     > Mike
    >>>> >     > Kevin
    >>>> >
    >>>> >     > (which is an empty logical array that holds the names of the
    >>>> people
    >>>> > and
    >>>> >     > their clothes). I find the behaviour of 71219 very much
    >>>> preferable
    >>>> > because
    >>>> >     > there is no reason to discard the information in the dimnames.
    >>>> >
    >>>> > Thanks a lot, Robin, (and Oliver) !
    >>>> >
    >>>> > Yes, the above is such a case where the new behavior makes much sense.
    >>>> > And this behavior remains identical after the 71222 amendment.
    >>>> >
    >>>> > Martin
    >>>> >
    >>>> >     > Best wishes
    >>>> >     > Robin
    >>>> >
    >>>> >
    >>>> >
    >>>> >
    >>>> >     > On Wed, Sep 7, 2016 at 9:49 PM, Martin Maechler <
    >>>> > maechler at stat.math.ethz.ch>
    >>>> >     > wrote:
    >>>> >
    >>>> >     >> >>>>> Martin Maechler <maechler at stat.math.ethz.ch>
    >>>> >     >> >>>>>     on Tue, 6 Sep 2016 22:26:31 +0200 writes:
    >>>> >     >>
    >>>> >     >> > Yesterday, changes to R's development version were committed,
    >>>> >     >> relating
    >>>> >     >> > to arithmetic, logic ('&' and '|') and
    >>>> >     >> > comparison/relational ('<', '==') binary operators
    >>>> >     >> > which in NEWS are described as
    >>>> >     >>
    >>>> >     >> > SIGNIFICANT USER-VISIBLE CHANGES:
    >>>> >     >>
    >>>> >     >> > [.............]
    >>>> >     >>
    >>>> >     >> > • Arithmetic, logic (‘&’, ‘|’) and comparison (aka
    >>>> >     >> > ‘relational’, e.g., ‘<’, ‘==’) operations with arrays now
    >>>> >     >> > behave consistently, notably for arrays of length zero.
    >>>> >     >>
    >>>> >     >> > Arithmetic between length-1 arrays and longer non-arrays had
    >>>> >     >> > silently dropped the array attributes and recycled.  This
    >>>> >     >> > now gives a warning and will signal an error in the future,
    >>>> >     >> > as it has always for logic and comparison operations in
    >>>> >     >> > these cases (e.g., compare ‘matrix(1,1) + 2:3’ and
    >>>> >     >> > ‘matrix(1,1) < 2:3’).
    >>>> >     >>
    >>>> >     >> > As the above "visually suggests" one could think of the
    >>>> changes
    >>>> >     >> > falling mainly two groups,
    >>>> >     >> > 1) <0-extent array>  (op)     <non-array>
    >>>> >     >> > 2) <1-extent array>  (arith)  <non-array of length != 1>
    >>>> >     >>
    >>>> >     >> > These changes are partly non-back compatible and may break
    >>>> >     >> > existing code.  We believe that the internal consistency
    >>>> gained
    >>>> >     >> > from the changes is worth the few places with problems.
    >>>> >     >>
    >>>> >     >> > We expect some package maintainers (10-20, or even more?) need
    >>>> >     >> > to adapt their code.
    >>>> >     >>
    >>>> >     >> > Case '2)' above mainly results in a new warning, e.g.,
    >>>> >     >>
    >>>> >     >> >> matrix(1,1) + 1:2
    >>>> >     >> > [1] 2 3
    >>>> >     >> > Warning message:
    >>>> >     >> > In matrix(1, 1) + 1:2 :
    >>>> >     >> > dropping dim() of array of length one.  Will become ERROR
    >>>> >     >> >>
    >>>> >     >>
    >>>> >     >> > whereas '1)' gives errors in cases the result silently was a
    >>>> >     >> > vector of length zero, or also keeps array (dim & dimnames) in
    >>>> >     >> > cases these were silently dropped.
    >>>> >     >>
    >>>> >     >> > The following is a "heavily" commented  R script showing (all
    >>>> ?)
    >>>> >     >> > the important cases with changes :
    >>>> >     >>
    >>>> >     >> > ------------------------------------------------------------
    >>>> >     >> ----------------
    >>>> >     >>
    >>>> >     >> > (m <- cbind(a=1[0], b=2[0]))
    >>>> >     >> > Lm <- m; storage.mode(Lm) <- "logical"
    >>>> >     >> > Im <- m; storage.mode(Im) <- "integer"
    >>>> >     >>
    >>>> >     >> > ## 1. -------------------------
    >>>> >     >> > try( m & NULL ) # in R <= 3.3.x :
    >>>> >     >> > ## Error in m & NULL :
    >>>> >     >> > ##  operations are possible only for numeric, logical or
    >>>> complex
    >>>> >     >> types
    >>>> >     >> > ##
    >>>> >     >> > ## gives 'Lm' in R >= 3.4.0
    >>>> >     >>
    >>>> >     >> > ## 2. -------------------------
    >>>> >     >> > m + 2:3 ## gave numeric(0), now remains matrix identical to  m
    >>>> >     >> > Im + 2:3 ## gave integer(0), now remains matrix identical to
    >>>> Im
    >>>> >     >> (integer)
    >>>> >     >>
    >>>> >     >> > m > 1      ## gave logical(0), now remains matrix identical
    >>>> to Lm
    >>>> >     >> (logical)
    >>>> >     >> > m > 0.1[0] ##  ditto
    >>>> >     >> > m > NULL   ##  ditto
    >>>> >     >>
    >>>> >     >> > ## 3. -------------------------
    >>>> >     >> > mm <- m[,c(1:2,2:1,2)]
    >>>> >     >> > try( m == mm ) ## now gives error   "non-conformable arrays",
    >>>> >     >> > ## but gave logical(0) in R <= 3.3.x
    >>>> >     >>
    >>>> >     >> > ## 4. -------------------------
    >>>> >     >> > str( Im + NULL)  ## gave "num", now gives "int"
    >>>> >     >>
    >>>> >     >> > ## 5. -------------------------
    >>>> >     >> > ## special case for arithmetic w/ length-1 array
    >>>> >     >> > (m1 <- matrix(1,1,1, dimnames=list("Ro","col")))
    >>>> >     >> > (m2 <- matrix(1,2,1, dimnames=list(c("A","B"),"col")))
    >>>> >     >>
    >>>> >     >> > m1 + 1:2  # ->  2:3  but now with warning to  "become ERROR"
    >>>> >     >> > tools::assertError(m1 & 1:2)# ERR: dims [product 1] do not
    >>>> match
    >>>> > the
    >>>> >     >> length of object [2]
    >>>> >     >> > tools::assertError(m1 < 1:2)# ERR:                  (ditto)
    >>>> >     >> > ##
    >>>> >     >> > ## non-0-length arrays combined with {NULL or double() or ...}
    >>>> > *fail*
    >>>> >     >>
    >>>> >     >> > ### Length-1 arrays:  Arithmetic with |vectors| > 1  treated
    >>>> array
    >>>> >     >> as scalar
    >>>> >     >> > m1 + NULL # gave  numeric(0) in R <= 3.3.x --- still, *but* w/
    >>>> >     >> warning to "be ERROR"
    >>>> >     >> > try(m1 > NULL)    # gave  logical(0) in R <= 3.3.x --- an
    >>>> *error*
    >>>> >     >> now in R >= 3.4.0
    >>>> >     >> > tools::assertError(m1 & NULL)    # gave and gives error
    >>>> >     >> > tools::assertError(m1 | double())# ditto
    >>>> >     >> > ## m2 was slightly different:
    >>>> >     >> > tools::assertError(m2 + NULL)
    >>>> >     >> > tools::assertError(m2 & NULL)
    >>>> >     >> > try(m2 == NULL) ## was logical(0) in R <= 3.3.x; now error as
    >>>> > above!
    >>>> >     >>
    >>>> >     >> > ------------------------------------------------------------
    >>>> >     >> ----------------
    >>>> >     >>
    >>>> >     >>
    >>>> >     >> > Note that in R's own  'nls'  sources, there was one case of
    >>>> >     >> > situation '2)' above, i.e. a  1x1-matrix was used as a
    >>>> "scalar".
    >>>> >     >>
    >>>> >     >> > In such cases, you should explicitly coerce it to a vector,
    >>>> >     >> > either ("self-explainingly") by  as.vector(.), or as I did in
    >>>> >     >> > the nls case  by  c(.) :  The latter is much less
    >>>> >     >> > self-explaining, but nicer to read in mathematical formulae,
    >>>> and
    >>>> >     >> > currently also more efficient because it is a .Primitive.
    >>>> >     >>
    >>>> >     >> > Please use R-devel with your code, and let us know if you see
    >>>> >     >> > effects that seem adverse.
    >>>> >     >>
    >>>> >     >> I've been slightly surprised (or even "frustrated") by the empty
    >>>> >     >> reaction on our R-devel list to this post.
    >>>> >     >>
    >>>> >     >> I would have expected some critique, may be even some praise,
    >>>> >     >> ... in any case some sign people are "thinking along" (as we say
    >>>> >     >> in German).
    >>>> >     >>
    >>>> >     >> In the mean time, I've actually thought along the one case which
    >>>> >     >> is last above:  The <op>  (binary operation) between a
    >>>> >     >> non-0-length array and a 0-length vector (and NULL which should
    >>>> >     >> be treated like a 0-length vector):
    >>>> >     >>
    >>>> >     >> R <= 3.3.1  *is* quite inconsistent with these:
    >>>> >     >>
    >>>> >     >>
    >>>> >     >> and my proposal above (implemented in R-devel, since Sep.5)
    >>>> would
    >>>> > give an
    >>>> >     >> error for all these, but instead, R really could be more lenient
    >>>> > here:
    >>>> >     >> A 0-length result is ok, and it should *not* inherit the array
    >>>> >     >> (dim, dimnames), since the array is not of length 0. So instead
    >>>> >     >> of the above [for the very last part only!!], we would aim for
    >>>> >     >> the following. These *all* give an error in current R-devel,
    >>>> >     >> with the exception of 'm1 + NULL' which "only" gives a "bad
    >>>> >     >> warning" :
    >>>> >     >>
    >>>> >     >> ------------------------
    >>>> >     >>
    >>>> >     >> m1 <- matrix(1,1)
    >>>> >     >> m2 <- matrix(1,2)
    >>>> >     >>
    >>>> >     >> m1 + NULL #    numeric(0) in R <= 3.3.x ---> OK ?!
    >>>> >     >> m1 > NULL #    logical(0) in R <= 3.3.x ---> OK ?!
    >>>> >     >> try(m1 & NULL)    # ERROR in R <= 3.3.x ---> change to
    >>>> logical(0)
    >>>> > ?!
    >>>> >     >> try(m1 | double())# ERROR in R <= 3.3.x ---> change to
    >>>> logical(0)
    >>>> > ?!
    >>>> >     >> ## m2 slightly different:
    >>>> >     >> try(m2 + NULL)  # ERROR in R <= 3.3.x ---> change to double(0)
    >>>> ?!
    >>>> >     >> try(m2 & NULL)  # ERROR in R <= 3.3.x ---> change to
    >>>> logical(0)  ?!
    >>>> >     >> m2 == NULL # logical(0) in R <= 3.3.x ---> OK ?!
    >>>> >     >>
    >>>> >     >> ------------------------
    >>>> >     >>
    >>>> >     >> This would be slightly more back-compatible than the currently
    >>>> >     >> implemented proposal. Everything else I said remains true, and
    >>>> >     >> I'm pretty sure most changes needed in packages would remain to
    >>>> be
    >>>> > done.
    >>>> >     >>
    >>>> >     >> Opinions ?
    >>>> >     >>
    >>>> >     >>
    >>>> >     >>
    >>>> >     >> > In some case where R-devel now gives an error but did not
    >>>> >     >> > previously, we could contemplate giving another  "warning
    >>>> >     >> > .... 'to become ERROR'" if there was too much breakage,
    >>>> though
    >>>> >     >> > I don't expect that.
    >>>> >     >>
    >>>> >     >>
    >>>> >     >> > For the R Core Team,
    >>>> >     >>
    >>>> >     >> > Martin Maechler,
    >>>> >     >> > ETH Zurich
    >>>> >     >>
    >>>> >     >> ______________________________________________
    >>>> >     >> R-devel at r-project.org mailing list
    >>>> >     >> https://stat.ethz.ch/mailman/listinfo/r-devel
    >>>> >     >>
    >>>> >
    >>>> >
    >>>> >
    >>>> >     > --
    >>>> >     > Robin Hankin
    >>>> >     > Neutral theorist
    >>>> >     > hankin.robin at gmail.com
    >>>> >
    >>>> >     > [[alternative HTML version deleted]]
    >>>> >
    >>>> > ______________________________________________
    >>>> > R-devel at r-project.org mailing list
    >>>> > https://stat.ethz.ch/mailman/listinfo/r-devel
    >>>> >
    >>>> 
    >>>> 
    >>>> 
    >>>> --
    >>>> Gabriel Becker, PhD
    >>>> Associate Scientist (Bioinformatics)
    >>>> Genentech Research
    >>>> 
    >>>> [[alternative HTML version deleted]]
    >>>> 
    >>>> ______________________________________________
    >>>> R-devel at r-project.org mailing list
    >>>> https://stat.ethz.ch/mailman/listinfo/r-devel
    >>>> 
    >>> 
    >>> 
    >> 
    >> 
    >> --
    >> Gabriel Becker, PhD
    >> Associate Scientist (Bioinformatics)
    >> Genentech Research
    >> 

    > [[alternative HTML version deleted]]