[R] factor documentation issue

Heinz Tuechler tuechler at gmx.at
Wed Feb 28 10:49:52 CET 2007

At 09:41 28.02.2007 +1030, Geoff Russell wrote:
>There is a warning in the documentation for ?factor  (R version 2.3.0)
>as follows:
>" The interpretation of a factor depends on both the codes and the
>  '"levels"' attribute.  Be careful only to compare factors with the
>  same set of levels (in the same order).  In particular,
>  'as.numeric' applied to a factor is meaningless, and may happen by
>  implicit coercion.  To "revert" a factor 'f' to its original
>  numeric values, 'as.numeric(levels(f))[f]' is recommended and
>  slightly more efficient than 'as.numeric(as.character(f))'.
>But as.numeric seems to work fine whereas as.numeric(levels(f))[f] doesn't
>always do anything useful.
>For example:
>> f<-factor(1:3,labels=c("A","B","C"))
>> f
>[1] A B C
>Levels: A B C
>> as.numeric(f)
>[1] 1 2 3
>> as.numeric(levels(f))[f]
>[1] NA NA NA
>Warning message:
>NAs introduced by coercion
>And also,
>> f<-factor(1:3,labels=c(1,5,6))
>> f
>[1] 1 5 6
>Levels: 1 5 6
>> as.numeric(f)
>[1] 1 2 3
>> as.numeric(levels(f))[f]
>[1] 1 5 6
>Is the documentation wrong, or is the code wrong, or have I missed
>Geoff Russell
>R-help at stat.math.ethz.ch mailing list
>PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.
>From "R Language Definition"

"2.3.1 Factors

Factors are used to describe items that can have a finite number of values
(gender, social class, etc.). 
Factors are currently implemented using an integer array to specify the
actual levels and a second array of names that are mapped to the integers.
Rather unfortunately users often make use of the implementation in order to
make some calculations easier. This, however, is an implementation issue
and is not guaranteed to hold in all implementations of R."

In my view factors are (miss)used in different, not necessarily connected
A factor may represent a statistical concept i.e. a categorical variable.
Further it may be an (internal) way of data reduction or some method for
labelling values.
In my view these concepts should not be mixed up and would I recommend to
avoid factors for data reduction and labelling.


More information about the R-help mailing list