[Rd] Coercion of 'exclude' in function 'factor' (was 'droplevels' inappropriate change)
Suharto Anggono Suharto Anggono
suharto_anggono at yahoo.com
Fri Sep 2 18:10:00 CEST 2016
I am basically fine with the change.
How about using just the following?
if(!is.character(exclude))
exclude <- as.vector(exclude, typeof(x)) # may result in NA
x <- as.character(x)
It looks simpler and is, more or less, equivalent.
In factor.Rd, in description of argument 'exclude', "(when \code{x} is a \code{factor} already)" can be removed.
A larger change that, I think, is reasonable is entirely removing the code
exclude <- as.vector(exclude, typeof(x)) # may result in NA
The explicit coercion of 'exclude' is not necessary. Function 'factor' works without it.
The coercion of 'exclude' may leads to a surprise because it "may result in NA". Example from https://stat.ethz.ch/pipermail/r-help/2005-April/069053.html :
factor(as.integer(c(1,2,3,3,NA)), exclude=NaN)
excludes NA.
As a bonus, without the coercion of 'exclude', 'exclude' can be a factor if 'x' is a factor. This part of an example in https://stat.ethz.ch/pipermail/r-help/2011-April/276274.html works.
cc <- c("x","y","NA")
ff <- factor(cc)
factor(ff,exclude=ff[3])
However, the coercion of 'exclude' has been in function 'factor' in R "forever".
--------------------------------------------
On Wed, 31/8/16, Martin Maechler <maechler at stat.math.ethz.ch> wrote:
Subject: Re: [Rd] 'droplevels' inappropriate change
Cc: "Martin Maechler" <maechler at stat.math.ethz.ch>
Date: Wednesday, 31 August, 2016, 2:51 PM
>>>>> Martin Maechler <maechler at stat.math.ethz.ch>
>>>>> on Sat, 27 Aug 2016 18:55:37 +0200 writes:
>>>>> Suharto Anggono Suharto Anggono via R-devel <r-devel at r-project.org>
>>>>> on Sat, 27 Aug 2016 03:17:32 +0000 writes:
>> In R devel r71157, 'droplevels' documentation, in "Arguments" section, says this about argument 'exclude'.
>> passed to factor(); factor levels which should be excluded from the result even if present. Note that this was implicitly NA in R <= 3.3.1 which did drop NA levels even when present in x, contrary to the documentation. The current default is compatible with x[ , drop=FALSE].
>> The part
>> x[ , drop=FALSE]
>> should be
>> x[ , drop=TRUE]
[[elided Yahoo spam]]
> a "typo" by me. .. fixed now.
>> Saying that 'exclude' is factor levels is not quite true for NA element. NA may be not an original level, but NA in 'exclude' affects the result.
>> For a factor 'x', factor(x, exclude = exclude) doesn't really work for excluding in general. See, for example, https://stat.ethz.ch/pipermail/r-help/2005-September/079336.html .
>> factor(factor(c("a","b","c")), exclude="c")
>> However, this excludes "2":
>> factor(factor(2:3), exclude=2)
>> Rather unexpectedly, this excludes NA:
>> factor(factor(c("a",NA), exclude=NULL), exclude="c")
>> For a factor 'x', factor(x, exclude = exclude) can only exclude integer-like or NA levels. An explanation is in https://stat.ethz.ch/pipermail/r-help/2011-April/276274.html .
> Well, Peter Dalgaard (in that R-devel e-mail, a bit more than 5
> years ago) is confirming the problem there, and suggesting (as
> you, right?) that actually `factor()` is not behaving
> correctly here.
> And your persistence is finally getting close to convince me
> that it is not just droplevels(), but factor() itself which
> needs care here.
> Interestingly, the following patch *does* pass 'make check-all'
> (after small change in tests/reg-tests-1b.R which is ok),
> and leads to behavior which is much closer to the documentation,
> notably for your two examples above would give what one would
> expect.
> (( If the R-Hub would support experiments with branches of R-devel
> from R-core members, I could just create such a branch and R Hub
> would run 'R CMD check <pkg>' for thousands of CRAN packages
> and provide a web page with the *differences* in the package
> check results ... so we could see ... ))
> I do agree that we should strongly consider such a change.
as nobody has commented, I've been liberal and have taken these
no comments as consent.
Hence I have committed
------------------------------------------------------------------------
r71178 | maechler | 2016-08-31 09:45:40 +0200 (Wed, 31 Aug 2016) | 1 line
Changed paths:
M /trunk/doc/NEWS.Rd
M /trunk/src/library/base/R/factor.R
M /trunk/src/library/base/man/factor.Rd
M /trunk/tests/reg-tests-1b.R
M /trunk/tests/reg-tests-1c.R
factor(x, exclude) more "rational" when x or exclude are character
------------------------------------------------------------------------
which apart from documentation, examples, and regression tests
is just the patch below.
Martin Maechler
ETH Zurich
> --- factor.R (revision 71157)
> +++ factor.R (working copy)
> @@ -28,8 +28,12 @@
> levels <- unique(y[ind])
> }
> force(ordered) # check if original x is an ordered factor
> - exclude <- as.vector(exclude, typeof(x)) # may result in NA
> - x <- as.character(x)
> + if(!is.character(x)) {
> + if(!is.character(exclude))
> + exclude <- as.vector(exclude, typeof(x)) # may result in NA
> + x <- as.character(x)
> + } else
> + exclude <- as.vector(exclude, typeof(x)) # may result in NA
> ## levels could be a long vectors, but match will not handle that.
> levels <- levels[is.na(match(levels, exclude))]
> f <- match(x, levels)
Delete Reply Reply All Forward Apply
More information about the R-devel
mailing list