[Rd] [bug] droplevels() also drop object attributes (comment…)
Suharto Anggono Suharto Anggono
suharto_anggono at yahoo.com
Thu Jun 8 18:43:48 CEST 2017
* Be careful with "contrasts" attribute. If the number of levels is reduced, the original contrasts matrix is no longer valid.
Example case:
x <- factor(c("a", "a", "b", "b", "b"), levels = c("a", "b", "c"))
contrasts(x) <- contr.treatment(levels(x), contrasts=FALSE)[, -2, drop=FALSE]
droplevels(x)
* If function 'factor' is changed, make sure that as.factor(x) and factor(x) is the same for 'x' where is.integer(x) is TRUE. Currently, as.factor(<integer>) is treated specially.
* It is possible that names(x) is not attr(x, "names"). For example, 'x' is a "POSIXlt" object.
Look at this example, which works in R 3.3.2.
x <- as.POSIXlt("2017-01-01", tz="UTC")
factor(x, levels=x)
By the way, in NEWS, in "CHANGES IN R 3.4.0", in "SIGNIFICANT USER-VISIBLE CHANGES", there is "factor() now uses order() to sort its levels". It is false. Code of function 'factor' in R 3.4.0 (https://svn.r-project.org/R/tags/R-3-4-0/src/library/base/R/factor.R) still uses 'sort.list', not 'order'.
--------------------------------
>>>>> Martin Maechler <maechler at stat.math.ethz.ch>
>>>>> on Tue, 16 May 2017 11:01:23 +0200 writes:
>>>>> Serge Bibauw <sbibauw at gmail.com>
>>>>> on Mon, 15 May 2017 11:59:32 -0400 writes:
>> Hi,
>> Just reporting a small bug… not really a big deal, but I
>> don’t think that is intended: droplevels() also drops all
>> object’s attributes.
> Yes. The help page for droplevels (or the simple
> definition of 'droplevels.factor') clearly indicate that
> the method for factors is really just a call to factor(x,
> exclude = *)
> and that _is_ quite an important base function whose
> semantic should not be changed lightly. Still, let's
> continue :
> Looking a bit, I see that the current behavior of factor()
> {and hence droplevels} has been unchanged in this respect
> for the whole history of R, well, at least for more than
> 17 years (R 1.0.1, April 2000).
> I'd agree there _is_ a bug, at least in the documentation
> which does *not* mention that currently, all attributes
> are dropped but "names", "levels" (and "class").
> OTOH, factor() would only need a small change to make it
> preserve all attributes (but "class" and "levels" which
> are set explicitly).
> I'm sure this will break some checks in some packages. Is
> it worth it?
> e.g., our own R QC checks currently check (the printing of) the
> following (in tests/reg-tests-2.R ):
> > ## some tests of factor matrices
> > A <- factor(7:12)
> > dim(A) <- c(2, 3)
> > A
> [,1] [,2] [,3]
> [1,] 7 9 11
> [2,] 8 10 12
> Levels: 7 8 9 10 11 12
> > str(A)
> factor [1:2, 1:3] 7 8 9 10 ...
> - attr(*, "levels")= chr [1:6] "7" "8" "9" "10" ...
> > A[, 1:2]
> [,1] [,2]
> [1,] 7 9
> [2,] 8 10
> Levels: 7 8 9 10 11 12
> > A[, 1:2, drop=TRUE]
> [1] 7 8 9 10
> Levels: 7 8 9 10
>
> with the proposed change to factor(),
> the last call would change its result:
>
> > A[, 1:2, drop=TRUE]
> [,1] [,2]
> [1,] 7 9
> [2,] 8 10
> Levels: 7 8 9 10
> because 'drop=TRUE' calls factor(..) and that would also
> preserve the "dim" attribute. I would think that the
> changed behavior _is_ better, and is also according to
> documentation, because the help page for [.factor explains
> that 'drop = TRUE' drops levels, but _not_ that it
> transforms a factor matrix into a factor (vector).
> Martin
I'm finally coming back to this.
It still seems to make sense to change factor() and hence
droplevels() behavior here, and plan to commit this change
within a day.
Martin Maechler
ETH Zurich
More information about the R-devel
mailing list