[Rd] R (development) changes in arith, logic, relop with (0-extent) arrays
Martin Maechler
maechler at stat.math.ethz.ch
Fri Sep 9 08:51:46 CEST 2016
Thank you, Gabe and Bill,
for taking up the discussion.
>>>>> William Dunlap <wdunlap at tibco.com>
>>>>> on Thu, 8 Sep 2016 10:45:07 -0700 writes:
> Prior to the mid-1990s, S did "length-0 OP length-n -> rep(NA, n)" and it
> was changed
> to "length-0 OP length-n -> length-0" to avoid lots of problems like
> any(x<0) being NA
> when length(x)==0. Yes, people could code defensively by putting lots of
> if(length(x)==0)...
> in their code, but that is tedious and error-prone and creates really ugly
> code.
Yes, so actually, basically
length-0 OP <anything> -> length-0
Now the case of NULL that Bill mentioned.
I agree that NULL is not at all the same thing as double(0) or logical(0),
*but* there have been quite a few cases, where NULL is the
result of operations where "for consistency" double(0) / logical(0) should have
been.... and there are the users who use NULL as the equivalent
of those, e.g., by initializing a (to be grown, yes, very inefficient!)
vector with NULL instead of with say double(0).
For these reasons, many operations that expect a "number-like"
(includes logical) atomic vector have treated NULL as such...
*and* parts of the {arith/logic/relop} OPs have done so already
in R "forever".
I still would argue that for these OPs, treating NULL as logical(0) {which
then may be promoted by the usual rules} is good thing.
> Is your suggestion to leave the length-0 OP length-1 case as it is but make
> length-0 OP length-two-or-higher an error or warning (akin to the length-2
> OP length-3 case)?
That's exactly what one thing the current changes eliminated:
arithmetic (only; not logic, or relop) did treat the length-1
case (for arrays!) different from the length-GE-2 case. And I strongly
believe that this is very wrong and counter to the predominant
recycling rules in (S and) R.
> By the way, the all(numeric(0)<0) is TRUE, as is all(numeric()>0), by de
> Morgan's rule, but that is not really relevant here.
> Bill Dunlap
> TIBCO Software
> wdunlap tibco.com
> On Thu, Sep 8, 2016 at 10:22 AM, Gabriel Becker <gmbecker at ucdavis.edu>
> wrote:
>>
>>
>> On Thu, Sep 8, 2016 at 10:05 AM, William Dunlap <wdunlap at tibco.com> wrote:
>>
>>> Shouldn't binary operators (arithmetic and logical) should throw an error
>>> when one operand is NULL (or other type that doesn't make sense)? This is
>>> a different case than a zero-length operand of a legitimate type. E.g.,
>>> any(x < 0)
>>> should return FALSE if x is number-like and length(x)==0 but give an
>>> error if x is NULL.
>>>
>> Bill,
>>
>> That is a good point. I can see the argument for this in the case that the
>> non-zero length is 1. I'm not sure which is better though. If we switch
>> any() to all(), things get murky.
>>
>> Mathematically, all(x<0) is TRUE if x is length 0 (as are all(x==0), and
>> all(x>0)), but the likelihood of this being a thought-bug on the author's
>> part is exceedingly high, imho. So the desirable behavior seems to depend
>> on the angle we look at it from.
>>
>> My personal opinion is that x < y with length(x)==0 should fail if length(y)
>> > 1, at least, and I'd be for it being an error even if y is length 1,
>> though I do acknowledge this is more likely (though still quite unlikely
>> imho) to be the intended behavior.
>>
>> ~G
>>
>>>
>>> I.e., I think the type check should be done before the length check.
>>>
>>>
>>> Bill Dunlap
>>> TIBCO Software
>>> wdunlap tibco.com
>>>
>>> On Thu, Sep 8, 2016 at 8:43 AM, Gabriel Becker <gmbecker at ucdavis.edu>
>>> wrote:
>>>
>>>> Martin,
>>>>
>>>> Like Robin and Oliver I think this type of edge-case consistency is
>>>> important and that it's fantastic that R-core - and you personally - are
>>>> willing to tackle some of these "gotcha" behaviors. "Little" stuff like
>>>> this really does combine to go a long way to making R better and better.
>>>>
>>>> I do wonder a bit about the
>>>>
>>>> x = 1:2
>>>>
>>>> y = NULL
>>>>
>>>> x < y
>>>>
>>>> case.
>>>>
>>>> Returning a logical of length 0 is more backwards compatible, but is it
>>>> ever what the author actually intended? I have trouble thinking of a case
>>>> where that less-than didn't carry an implicit assumption that y was
>>>> non-NULL. I can say that in my own code, I've never hit that behavior
>>>> in a
>>>> case that wasn't an error.
>>>>
>>>> My vote (unless someone else points out a compelling use for the
>>>> behavior)
>>>> is for the to throw an error. As a developer, I'd rather things like this
>>>> break so the bug in my logic is visible, rather than propagating as the
>>>> 0-length logical is &'ed or |'ed with other logical vectors, or used to
>>>> subset, or (in the case it should be length 1) passed to if() (if throws
>>>> an
>>>> error now, but the rest would silently "work").
>>>>
>>>> Best,
>>>> ~G
>>>>
>>>> On Thu, Sep 8, 2016 at 3:49 AM, Martin Maechler <
>>>> maechler at stat.math.ethz.ch>
>>>> wrote:
>>>>
>>>> > >>>>> robin hankin <hankin.robin at gmail.com>
>>>> > >>>>> on Thu, 8 Sep 2016 10:05:21 +1200 writes:
>>>> >
>>>> > > Martin I'd like to make a comment; I think that R's
>>>> > > behaviour on 'edge' cases like this is an important thing
>>>> > > and it's great that you are working on it.
>>>> >
>>>> > > I make heavy use of zero-extent arrays, chiefly because
>>>> > > the dimnames are an efficient and logical way to keep
>>>> > > track of certain types of information.
>>>> >
>>>> > > If I have, for example,
>>>> >
>>>> > > a <- array(0,c(2,0,2))
>>>> > > dimnames(a) <- list(name=c('Mike','Kevin'),
>>>> > NULL,item=c("hat","scarf"))
>>>> >
>>>> >
>>>> > > Then in R-3.3.1, 70800 I get
>>>> >
>>>> > a> 0
>>>> > > logical(0)
>>>> > >>
>>>> >
>>>> > > But in 71219 I get
>>>> >
>>>> > a> 0
>>>> > > , , item = hat
>>>> >
>>>> >
>>>> > > name
>>>> > > Mike
>>>> > > Kevin
>>>> >
>>>> > > , , item = scarf
>>>> >
>>>> >
>>>> > > name
>>>> > > Mike
>>>> > > Kevin
>>>> >
>>>> > > (which is an empty logical array that holds the names of the
>>>> people
>>>> > and
>>>> > > their clothes). I find the behaviour of 71219 very much
>>>> preferable
>>>> > because
>>>> > > there is no reason to discard the information in the dimnames.
>>>> >
>>>> > Thanks a lot, Robin, (and Oliver) !
>>>> >
>>>> > Yes, the above is such a case where the new behavior makes much sense.
>>>> > And this behavior remains identical after the 71222 amendment.
>>>> >
>>>> > Martin
>>>> >
>>>> > > Best wishes
>>>> > > Robin
>>>> >
>>>> >
>>>> >
>>>> >
>>>> > > On Wed, Sep 7, 2016 at 9:49 PM, Martin Maechler <
>>>> > maechler at stat.math.ethz.ch>
>>>> > > wrote:
>>>> >
>>>> > >> >>>>> Martin Maechler <maechler at stat.math.ethz.ch>
>>>> > >> >>>>> on Tue, 6 Sep 2016 22:26:31 +0200 writes:
>>>> > >>
>>>> > >> > Yesterday, changes to R's development version were committed,
>>>> > >> relating
>>>> > >> > to arithmetic, logic ('&' and '|') and
>>>> > >> > comparison/relational ('<', '==') binary operators
>>>> > >> > which in NEWS are described as
>>>> > >>
>>>> > >> > SIGNIFICANT USER-VISIBLE CHANGES:
>>>> > >>
>>>> > >> > [.............]
>>>> > >>
>>>> > >> > • Arithmetic, logic (‘&’, ‘|’) and comparison (aka
>>>> > >> > ‘relational’, e.g., ‘<’, ‘==’) operations with arrays now
>>>> > >> > behave consistently, notably for arrays of length zero.
>>>> > >>
>>>> > >> > Arithmetic between length-1 arrays and longer non-arrays had
>>>> > >> > silently dropped the array attributes and recycled. This
>>>> > >> > now gives a warning and will signal an error in the future,
>>>> > >> > as it has always for logic and comparison operations in
>>>> > >> > these cases (e.g., compare ‘matrix(1,1) + 2:3’ and
>>>> > >> > ‘matrix(1,1) < 2:3’).
>>>> > >>
>>>> > >> > As the above "visually suggests" one could think of the
>>>> changes
>>>> > >> > falling mainly two groups,
>>>> > >> > 1) <0-extent array> (op) <non-array>
>>>> > >> > 2) <1-extent array> (arith) <non-array of length != 1>
>>>> > >>
>>>> > >> > These changes are partly non-back compatible and may break
>>>> > >> > existing code. We believe that the internal consistency
>>>> gained
>>>> > >> > from the changes is worth the few places with problems.
>>>> > >>
>>>> > >> > We expect some package maintainers (10-20, or even more?) need
>>>> > >> > to adapt their code.
>>>> > >>
>>>> > >> > Case '2)' above mainly results in a new warning, e.g.,
>>>> > >>
>>>> > >> >> matrix(1,1) + 1:2
>>>> > >> > [1] 2 3
>>>> > >> > Warning message:
>>>> > >> > In matrix(1, 1) + 1:2 :
>>>> > >> > dropping dim() of array of length one. Will become ERROR
>>>> > >> >>
>>>> > >>
>>>> > >> > whereas '1)' gives errors in cases the result silently was a
>>>> > >> > vector of length zero, or also keeps array (dim & dimnames) in
>>>> > >> > cases these were silently dropped.
>>>> > >>
>>>> > >> > The following is a "heavily" commented R script showing (all
>>>> ?)
>>>> > >> > the important cases with changes :
>>>> > >>
>>>> > >> > ------------------------------------------------------------
>>>> > >> ----------------
>>>> > >>
>>>> > >> > (m <- cbind(a=1[0], b=2[0]))
>>>> > >> > Lm <- m; storage.mode(Lm) <- "logical"
>>>> > >> > Im <- m; storage.mode(Im) <- "integer"
>>>> > >>
>>>> > >> > ## 1. -------------------------
>>>> > >> > try( m & NULL ) # in R <= 3.3.x :
>>>> > >> > ## Error in m & NULL :
>>>> > >> > ## operations are possible only for numeric, logical or
>>>> complex
>>>> > >> types
>>>> > >> > ##
>>>> > >> > ## gives 'Lm' in R >= 3.4.0
>>>> > >>
>>>> > >> > ## 2. -------------------------
>>>> > >> > m + 2:3 ## gave numeric(0), now remains matrix identical to m
>>>> > >> > Im + 2:3 ## gave integer(0), now remains matrix identical to
>>>> Im
>>>> > >> (integer)
>>>> > >>
>>>> > >> > m > 1 ## gave logical(0), now remains matrix identical
>>>> to Lm
>>>> > >> (logical)
>>>> > >> > m > 0.1[0] ## ditto
>>>> > >> > m > NULL ## ditto
>>>> > >>
>>>> > >> > ## 3. -------------------------
>>>> > >> > mm <- m[,c(1:2,2:1,2)]
>>>> > >> > try( m == mm ) ## now gives error "non-conformable arrays",
>>>> > >> > ## but gave logical(0) in R <= 3.3.x
>>>> > >>
>>>> > >> > ## 4. -------------------------
>>>> > >> > str( Im + NULL) ## gave "num", now gives "int"
>>>> > >>
>>>> > >> > ## 5. -------------------------
>>>> > >> > ## special case for arithmetic w/ length-1 array
>>>> > >> > (m1 <- matrix(1,1,1, dimnames=list("Ro","col")))
>>>> > >> > (m2 <- matrix(1,2,1, dimnames=list(c("A","B"),"col")))
>>>> > >>
>>>> > >> > m1 + 1:2 # -> 2:3 but now with warning to "become ERROR"
>>>> > >> > tools::assertError(m1 & 1:2)# ERR: dims [product 1] do not
>>>> match
>>>> > the
>>>> > >> length of object [2]
>>>> > >> > tools::assertError(m1 < 1:2)# ERR: (ditto)
>>>> > >> > ##
>>>> > >> > ## non-0-length arrays combined with {NULL or double() or ...}
>>>> > *fail*
>>>> > >>
>>>> > >> > ### Length-1 arrays: Arithmetic with |vectors| > 1 treated
>>>> array
>>>> > >> as scalar
>>>> > >> > m1 + NULL # gave numeric(0) in R <= 3.3.x --- still, *but* w/
>>>> > >> warning to "be ERROR"
>>>> > >> > try(m1 > NULL) # gave logical(0) in R <= 3.3.x --- an
>>>> *error*
>>>> > >> now in R >= 3.4.0
>>>> > >> > tools::assertError(m1 & NULL) # gave and gives error
>>>> > >> > tools::assertError(m1 | double())# ditto
>>>> > >> > ## m2 was slightly different:
>>>> > >> > tools::assertError(m2 + NULL)
>>>> > >> > tools::assertError(m2 & NULL)
>>>> > >> > try(m2 == NULL) ## was logical(0) in R <= 3.3.x; now error as
>>>> > above!
>>>> > >>
>>>> > >> > ------------------------------------------------------------
>>>> > >> ----------------
>>>> > >>
>>>> > >>
>>>> > >> > Note that in R's own 'nls' sources, there was one case of
>>>> > >> > situation '2)' above, i.e. a 1x1-matrix was used as a
>>>> "scalar".
>>>> > >>
>>>> > >> > In such cases, you should explicitly coerce it to a vector,
>>>> > >> > either ("self-explainingly") by as.vector(.), or as I did in
>>>> > >> > the nls case by c(.) : The latter is much less
>>>> > >> > self-explaining, but nicer to read in mathematical formulae,
>>>> and
>>>> > >> > currently also more efficient because it is a .Primitive.
>>>> > >>
>>>> > >> > Please use R-devel with your code, and let us know if you see
>>>> > >> > effects that seem adverse.
>>>> > >>
>>>> > >> I've been slightly surprised (or even "frustrated") by the empty
>>>> > >> reaction on our R-devel list to this post.
>>>> > >>
>>>> > >> I would have expected some critique, may be even some praise,
>>>> > >> ... in any case some sign people are "thinking along" (as we say
>>>> > >> in German).
>>>> > >>
>>>> > >> In the mean time, I've actually thought along the one case which
>>>> > >> is last above: The <op> (binary operation) between a
>>>> > >> non-0-length array and a 0-length vector (and NULL which should
>>>> > >> be treated like a 0-length vector):
>>>> > >>
>>>> > >> R <= 3.3.1 *is* quite inconsistent with these:
>>>> > >>
>>>> > >>
>>>> > >> and my proposal above (implemented in R-devel, since Sep.5)
>>>> would
>>>> > give an
>>>> > >> error for all these, but instead, R really could be more lenient
>>>> > here:
>>>> > >> A 0-length result is ok, and it should *not* inherit the array
>>>> > >> (dim, dimnames), since the array is not of length 0. So instead
>>>> > >> of the above [for the very last part only!!], we would aim for
>>>> > >> the following. These *all* give an error in current R-devel,
>>>> > >> with the exception of 'm1 + NULL' which "only" gives a "bad
>>>> > >> warning" :
>>>> > >>
>>>> > >> ------------------------
>>>> > >>
>>>> > >> m1 <- matrix(1,1)
>>>> > >> m2 <- matrix(1,2)
>>>> > >>
>>>> > >> m1 + NULL # numeric(0) in R <= 3.3.x ---> OK ?!
>>>> > >> m1 > NULL # logical(0) in R <= 3.3.x ---> OK ?!
>>>> > >> try(m1 & NULL) # ERROR in R <= 3.3.x ---> change to
>>>> logical(0)
>>>> > ?!
>>>> > >> try(m1 | double())# ERROR in R <= 3.3.x ---> change to
>>>> logical(0)
>>>> > ?!
>>>> > >> ## m2 slightly different:
>>>> > >> try(m2 + NULL) # ERROR in R <= 3.3.x ---> change to double(0)
>>>> ?!
>>>> > >> try(m2 & NULL) # ERROR in R <= 3.3.x ---> change to
>>>> logical(0) ?!
>>>> > >> m2 == NULL # logical(0) in R <= 3.3.x ---> OK ?!
>>>> > >>
>>>> > >> ------------------------
>>>> > >>
>>>> > >> This would be slightly more back-compatible than the currently
>>>> > >> implemented proposal. Everything else I said remains true, and
>>>> > >> I'm pretty sure most changes needed in packages would remain to
>>>> be
>>>> > done.
>>>> > >>
>>>> > >> Opinions ?
>>>> > >>
>>>> > >>
>>>> > >>
>>>> > >> > In some case where R-devel now gives an error but did not
>>>> > >> > previously, we could contemplate giving another "warning
>>>> > >> > .... 'to become ERROR'" if there was too much breakage,
>>>> though
>>>> > >> > I don't expect that.
>>>> > >>
>>>> > >>
>>>> > >> > For the R Core Team,
>>>> > >>
>>>> > >> > Martin Maechler,
>>>> > >> > ETH Zurich
>>>> > >>
>>>> > >> ______________________________________________
>>>> > >> R-devel at r-project.org mailing list
>>>> > >> https://stat.ethz.ch/mailman/listinfo/r-devel
>>>> > >>
>>>> >
>>>> >
>>>> >
>>>> > > --
>>>> > > Robin Hankin
>>>> > > Neutral theorist
>>>> > > hankin.robin at gmail.com
>>>> >
>>>> > > [[alternative HTML version deleted]]
>>>> >
>>>> > ______________________________________________
>>>> > R-devel at r-project.org mailing list
>>>> > https://stat.ethz.ch/mailman/listinfo/r-devel
>>>> >
>>>>
>>>>
>>>>
>>>> --
>>>> Gabriel Becker, PhD
>>>> Associate Scientist (Bioinformatics)
>>>> Genentech Research
>>>>
>>>> [[alternative HTML version deleted]]
>>>>
>>>> ______________________________________________
>>>> R-devel at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>>>
>>>
>>>
>>
>>
>> --
>> Gabriel Becker, PhD
>> Associate Scientist (Bioinformatics)
>> Genentech Research
>>
> [[alternative HTML version deleted]]
More information about the R-devel
mailing list