[Rd] problem in levels<- and other inconsistencies
Dr. Jens Oehlschlägel
Jens.Oehlschlaegel at truecluster.com
Tue Sep 27 21:33:39 CEST 2016
# A couple of years ago
# I helped making R's character NA handling more consistent
# Today I report an issue with R's factor NA handling
# The core problem is that
# levels(g) <- levels(g)
# can change the levels of g
# more details below
# Kind regards
# Jens Oehlschlägel
# Say I have an NA element in a vector or list
x <- c("a","b",NA)
# then using split() it gets lost
split(x, x)
# as it is (somewhat) when converting to a default factor
table(as.factor(x))
# for table the workaround is
table(as.factor(x), exclude=NULL)
# but for split we need
f <- factor(x, exclude=NULL)
split(x, f)
# conclusion: we MUST use an NA level
# so far so good
g <- f
levels(g)
# but re-assigning the levels changes them
levels(g) <- levels(g)
levels(g)
# which I consider a severe problem.
# Yes, I read the help page of levels<-
# about removing levels by assigning NAs to them
# but that implies: we MUST NOT use an NA level
# If a language suggests
# that we MUST and we MUST NOT use an NA level
# the language has limited usefulness
# (and a user who depends on the language
# is put into a DOUBLE BIND)
# SUGGESTION: assure the above assignment does not change levels
# trying to apply the levels of f to new data also fails
g <- factor(x, levels=levels(f))
g
# and giving both arguments even stops
h <- factor(x, levels=levels(f), labels=levels(f))
# I do understand that exclude= meaningfully has effect
# if levels= are to be determined automatically, but
# SUGGESTION: with explicit levels= exclude= should be ignored.
# SUGGESTION: give split(x, y, exclude=NA) an exclude= argument,
# which when set to NULL will prevent dropping NA levels
# when coercing y to factor
# (it still remains open what should have priority
# if y is a factor with an NA-level and exclude=NA)
table(f, exclude=NA)
# here existing levels win over exclude=
# which is consistent with my suggestion for factor(, levels=, exclude=)
More information about the R-devel
mailing list