[Rd] duplicated factor labels.
Martin Maechler
maechler at stat.math.ethz.ch
Thu Jun 15 17:15:17 CEST 2017
>>>>> Paul Johnson <pauljohn32 at gmail.com>
>>>>> on Wed, 14 Jun 2017 19:00:11 -0500 writes:
> Dear R devel
> I've been wondering about this for a while. I am sorry to ask for your
> time, but can one of you help me understand this?
> This concerns duplicated labels, not levels, in the factor function.
> I think it is hard to understand that factor() fails, but levels()
> after does not
>> x <- 1:6
>> xlevels <- 1:6
>> xlabels <- c(1, NA, NA, 4, 4, 4)
>> y <- factor(x, levels = xlevels, labels = xlabels)
> Error in `levels<-`(`*tmp*`, value = if (nl == nL)
> as.character(labels) else paste0(labels, :
> factor level [3] is duplicated
>> y <- factor(x, levels = xlevels)
>> levels(y) <- xlabels
>> y
> [1] 1 <NA> <NA> 4 4 4
> Levels: 1 4
> If the latter use of levels() causes a good, expected result, couldn't
> factor(..., labels = xlabels) be made to the same thing?
I may misunderstand, but I think you are confusing 'labels' and 'levels'
here, (and you are not alone in this!) mostly because R's
factor() function treats them as arguments in a way that can be
confusing.. (but I don't think we'd want to change that; it's
been documented and in use for > 25 year (in S, S+, R).
Note that after the above,
> dput(y)
structure(c(1L, NA, NA, 2L, 2L, 2L), .Label = c("1", "4"), class = "factor")
and that of course _is_ a valid factor .. which you can easily
get directly via e.g.
> identical(y, factor(c(1,NA,NA,4,4,4)))
[1] TRUE
or also via
> identical(y, factor(c("1",NA,NA,"4","4","4")))
[1] TRUE
I really don't see a need for a change of factor().
It should remain as simple as possible (but not simpler :-).
Martin
More information about the R-devel
mailing list