[Rd] table(exclude = NULL) always includes NA

Suharto Anggono Suharto Anggono suharto_anggono at yahoo.com
Sat Sep 10 04:36:54 CEST 2016


Looking at the code of function 'table' in R devel r71227, I see that the part "remove NA level if it was added only for excluded in factor(a, exclude=.)" is not quite right.

In
		is.na(a) <- match(a0, c(exclude,NA), nomatch=0L)   ,
I think that what is intended is
                a[a0 %in% c(exclude,NA)] <- NA  .
So, it should be
		is.na(a) <- match(a0, c(exclude,NA), nomatch=0L) > 0L
or
		is.na(a) <- as.logical(match(a0, c(exclude,NA), nomatch=0L))  .
The parallel code
		is.na(a) <- match(a0,   exclude,     nomatch=0L)
is to be treated similarly.

Example that gives wrong result in R devel r71225:
table(3:1, exclude = 1)
table(3:1, exclude = 1, useNA = "always")
--------------------------------------------
On Tue, 16/8/16, Martin Maechler <maechler at stat.math.ethz.ch> wrote:

 Subject: Re: [Rd] table(exclude = NULL) always includes NA

 Cc: "Martin Maechler" <maechler at stat.math.ethz.ch>
 Date: Tuesday, 16 August, 2016, 5:42 PM

>>>>> Martin Maechler <maechler at stat.math.ethz.ch>
>>>>>     on Mon, 15 Aug 2016 12:35:41 +0200 writes:

>>>>> Martin Maechler <maechler at stat.math.ethz.ch>
>>>>>     on Mon, 15 Aug 2016 11:07:43 +0200 writes:


>>>>>     on Sun, 14 Aug 2016 03:42:08 +0000 writes:

    >>> useNA <- if (missing(useNA) && !missing(exclude) && !(NA %in% exclude)) "ifany"
    >>> An example where it change 'table' result for non-factor input, from https://stat.ethz.ch/pipermail/r-help/2005-April/069053.html :

    >>> x <- c(1,2,3,3,NA)
    >>> table(as.integer(x), exclude=NaN)

    >>> I bring the example up, in case that the change in result is not intended.

    >> Thanks a lot, Suharto.

    >> To me, the example is convincing that the change (I commited
    >> Friday), svn rev 71087 & 71088,   are a clear improvement:

    >> (As you surely know, but not all the other readers:)
    >> Before the change, the above example gave *different* results
    >> for  'x'  and  'as.integer(x)', the integer case *not* counting the NAs,
    >> whereas with the change in effect, they are the same:

    >>> x <- as.integer(dx <- c(1,2,3,3,NA))
    >>> table(x, exclude=NaN); table(dx, exclude=NaN)
    >> x
    >> 1    2    3 <NA> 
    >> 1    1    2    1 
    >> dx
    >> 1    2    3 <NA> 
    >> 1    1    2    1 
    >>> 

    >> --
    >> But the change has affected 6-8 (of the 8000+) CRAN packages
    >> which I am investigating now and probably will be in contact with the
    >> package maintainers after that.

    > There has been another bug in table(), since the time  'useNA'
    > was introduced, which gives (in released R, R-patched, or R-devel):

    >> table(1:3, exclude = 1, useNA = "ifany")

    > 2    3 <NA> 
    > 1    1    1 
    >> 

    > and that bug now (in R-devel, after my changes) triggers in
    > cases it did not previously, notably in

    > table(1:3, exclude = 1)

    > which now does set 'useNA = "ifany"' and so gives the same silly
    > result as the one above.

    > The reason for this bug is that   addNA(..)  is called (in all R
    > versions mentioned) in this case, but it should not.

    > I'm currently testing yet another amendment..

which was not sufficient... so I had to do *much* more work.

The result is code which functions -- I hope -- uniformly better
than the current code, but unfortunately, code that is much longer.

After all I came to the conclusion that using addNA() was not
good enough [I did not yet consider *changing* addNA() itself,
even though the only place we use it in R's own packages is
inside table()] and so for now have code in table() that does
the equivalent of addNA() *but* does remember if addNA() did add
an NA level or not.
I also have extended the regression tests considerably,
*and*  example(table)  now reverts to give identical output to
R 3.3.1 (which it did no longer in R-devel (r 71088)).

I'm still investigating the CRAN package fallout (from the above
change 4 days ago) but plan to commit my (unfortunately
somewhat extensive) changes.

Also, I think this will become the first in this year's R-devel

SIGNIFICANT USER-VISIBLE CHANGES:

  • ‘table()’ has been amended to be more internally consistent
    and become back compatible to R <= 2.7.2 again.
    Consequently, ‘table(1:2, exclude=NULL)’ no longer contains
    a zero count for ‘<NA>’, but ‘useNA = "always"’ continues to
    do so.


--
Martin



More information about the R-devel mailing list