[Rd] Coercion of 'exclude' in function 'factor' (was 'droplevels' inappropriate change)
Martin Maechler
maechler at stat.math.ethz.ch
Fri Sep 30 14:51:52 CEST 2016
>>>>> Martin Maechler <maechler at stat.math.ethz.ch>
>>>>> on Tue, 13 Sep 2016 18:33:35 +0200 writes:
>>>>> Suharto Anggono Suharto Anggono via R-devel <r-devel at r-project.org>
>>>>> on Fri, 2 Sep 2016 16:10:00 +0000 writes:
>> I am basically fine with the change.
>> How about using just the following?
>> if(!is.character(exclude))
>> exclude <- as.vector(exclude, typeof(x)) # may result in NA
>> x <- as.character(x)
>> It looks simpler and is, more or less, equivalent.
> yes, but the current code should be slightly faster..
>> In factor.Rd, in description of argument 'exclude', "(when \code{x} is a \code{factor} already)" can be removed.
>> A larger change that, I think, is reasonable is entirely removing the code
>> exclude <- as.vector(exclude, typeof(x)) # may result in NA
>> The explicit coercion of 'exclude' is not necessary.
>> Function 'factor' works without it.
>> The coercion of 'exclude' may lead to a surprise because it "may result in NA".
>> Example from https://stat.ethz.ch/pipermail/r-help/2005-April/069053.html :
>> factor(as.integer(c(1,2,3,3,NA)), exclude=NaN)
>> excludes NA.
>> As a bonus, without the coercion of 'exclude', 'exclude' can be a factor if 'x' is a factor. This part of an example in https://stat.ethz.ch/pipermail/r-help/2011-April/276274.html works.
>> cc <- c("x","y","NA")
>> ff <- factor(cc)
>> factor(ff,exclude=ff[3])
> Yes, two good reasons for a change. factor() would finally
> behave according to the documentation which has been mentioning
> that 'exclude' could be a factor: ((Until my R-devel changes of a
> few weeks ago, i.e. at least in all recent released versions of R)),
> the help page for factor has said
> || If 'exclude' is used it should either be a factor with the same
> || level set as 'x' or a set of codes for the levels to be excluded.
>> However, the coercion of 'exclude' has been in function 'factor' in R "forever".
> Indeed: On March 6, 1998, svn rev. 845, when the R source files got a
> '.R' appended, and quite a long time before R 1.0.0,
> the factor function was as short as (but using an .Internal(.) !)
> function (x, levels = sort(unique(x), na.last = TRUE), labels, exclude = NA,
> ordered = FALSE)
> {
> if (length(x) == 0)
> return(character(0))
> exclude <- as.vector(exclude, typeof(x))
> levels <- levels[is.na(match(levels, exclude))]
> x <- .Internal(factor(match(x, levels), length(levels),
> ordered))
> if (missing(labels))
> levels(x) <- levels
> else levels(x) <- labels
> x
> }
> and already contained that line.
> Nevertheless, simplifying factor() by removing that line (or those
> 2 lines, now!) seems to only have advantages....
> I'm not yet committing to anything, but currently would strongly
> consider it .. though *after* the
> '<length-1-array> OP <non-array>'
> issue has settled a bit.
(Which it has; the decision has been to keep it.)
I have now committed Suharto's proposal above, to entirely drop the
exclude <- as.vector(exclude, typeof(x))
parts
in the factor() function... which has the two advantages
mentioned above and simplifies the code (and documentation).
------------------------------------------------------------------------
r71424 | maechler | 2016-09-30 14:38:43 +0200 (Fri, 30 Sep 2016) | 1 line
Changed paths:
M /trunk/doc/NEWS.Rd
M /trunk/src/library/base/R/factor.R
M /trunk/src/library/base/man/factor.Rd
M /trunk/tests/reg-tests-1c.R
simplify factor(), allowing 'exclude= <factor>' as documented in R <= 3.3.x
------------------------------------------------------------------------
I do expect some "reaction" in CRAN/Bioconductor package space,
so the final word has not been spoken on this, but the new code
is more aestethic to me.
Thank you for the suggestion,
Martin
>> --------------------------------------------
>> On Wed, 31/8/16, Martin Maechler <maechler at stat.math.ethz.ch> wrote:
>> Subject: Re: [Rd] 'droplevels' inappropriate change
>> Cc: "Martin Maechler" <maechler at stat.math.ethz.ch>
>> Date: Wednesday, 31 August, 2016, 2:51 PM
>>>>> Martin Maechler <maechler at stat.math.ethz.ch>
>>>>> on Sat, 27 Aug 2016 18:55:37 +0200 writes:
>>>>> Suharto Anggono Suharto Anggono via R-devel <r-devel at r-project.org>
>>>>> on Sat, 27 Aug 2016 03:17:32 +0000 writes:
>>>> In R devel r71157, 'droplevels' documentation, in "Arguments" section, says this about argument 'exclude'.
>>>> passed to factor(); factor levels which should be excluded from the result even if present. Note that this was implicitly NA in R <= 3.3.1 which did drop NA levels even when present in x, contrary to the documentation. The current default is compatible with x[ , drop=FALSE].
>>>> The part
>>>> x[ , drop=FALSE]
>>>> should be
>>>> x[ , drop=TRUE]
>> [[elided Yahoo spam]]
>>> a "typo" by me. .. fixed now.
>>>> Saying that 'exclude' is factor levels is not quite true for NA element. NA may be not an original level, but NA in 'exclude' affects the result.
>>>> For a factor 'x', factor(x, exclude = exclude) doesn't really work for excluding in general. See, for example, https://stat.ethz.ch/pipermail/r-help/2005-September/079336.html .
>>>> factor(factor(c("a","b","c")), exclude="c")
>>>> However, this excludes "2":
>>>> factor(factor(2:3), exclude=2)
>>>> Rather unexpectedly, this excludes NA:
>>>> factor(factor(c("a",NA), exclude=NULL), exclude="c")
>>>> For a factor 'x', factor(x, exclude = exclude) can only exclude integer-like or NA levels. An explanation is in https://stat.ethz.ch/pipermail/r-help/2011-April/276274.html .
>>> Well, Peter Dalgaard (in that R-devel e-mail, a bit more than 5
>>> years ago) is confirming the problem there, and suggesting (as
>>> you, right?) that actually `factor()` is not behaving
>>> correctly here.
>>> And your persistence is finally getting close to convince me
>>> that it is not just droplevels(), but factor() itself which
>>> needs care here.
>>> Interestingly, the following patch *does* pass 'make check-all'
>>> (after small change in tests/reg-tests-1b.R which is ok),
>>> and leads to behavior which is much closer to the documentation,
>>> notably for your two examples above would give what one would
>>> expect.
>>> (( If the R-Hub would support experiments with branches of R-devel
>>> from R-core members, I could just create such a branch and R Hub
>>> would run 'R CMD check <pkg>' for thousands of CRAN packages
>>> and provide a web page with the *differences* in the package
>>> check results ... so we could see ... ))
>>> I do agree that we should strongly consider such a change.
>> as nobody has commented, I've been liberal and have taken these
>> no comments as consent.
>> Hence I have committed
>> ------------------------------------------------------------------------
>> r71178 | maechler | 2016-08-31 09:45:40 +0200 (Wed, 31 Aug 2016) | 1 line
>> Changed paths:
>> M /trunk/doc/NEWS.Rd
>> M /trunk/src/library/base/R/factor.R
>> M /trunk/src/library/base/man/factor.Rd
>> M /trunk/tests/reg-tests-1b.R
>> M /trunk/tests/reg-tests-1c.R
>> factor(x, exclude) more "rational" when x or exclude are character
>> ------------------------------------------------------------------------
>> which apart from documentation, examples, and regression tests
>> is just the patch below.
>> Martin Maechler
>> ETH Zurich
>>> --- factor.R (revision 71157)
>>> +++ factor.R (working copy)
>>> @@ -28,8 +28,12 @@
>>> levels <- unique(y[ind])
>>> }
>>> force(ordered) # check if original x is an ordered factor
>>> - exclude <- as.vector(exclude, typeof(x)) # may result in NA
>>> - x <- as.character(x)
>>> + if(!is.character(x)) {
>>> + if(!is.character(exclude))
>>> + exclude <- as.vector(exclude, typeof(x)) # may result in NA
>>> + x <- as.character(x)
>>> + } else
>>> + exclude <- as.vector(exclude, typeof(x)) # may result in NA
>>> ## levels could be a long vectors, but match will not handle that.
>>> levels <- levels[is.na(match(levels, exclude))]
>>> f <- match(x, levels)
>> Delete Reply Reply All Forward Apply
>> ______________________________________________
>> R-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
More information about the R-devel
mailing list