[R] Factor to numeric conversion - as.numeric(levels(f))[f] - Language definition seems to say to not use this.

Peter Ehlers ehlers at ucalgary.ca
Mon Apr 1 23:29:55 CEST 2013


On 2013-04-01 13:08, Matthew Lundberg wrote:
> Note the edited subject line!  I don't know why I typed it as it was before.
>
> This says that as.numeric(as.character(f)) will work regardless of the
> implementation, and I agree.
>
> It's the recommendation to use as.numeric(levels(f))[f] that has me
> wondering about section 2.3.1 of the language definition.  I expect that
> this idiom is in widespread use, and perhaps the language definition
> should be changed.

I think that I may be getting an inkling of what your complaint is:
section 2.3.1 talks about
  "an integer array to specify the _actual_ levels" [emphasis added]
and
  "a second array of _names_ that are mapped to the integers". [ditto]

When you object to the use of "as.numeric(levels(f))[f]", are you
assuming that "levels(f)" is the set of _integers_ or the set of
_names_?

Anyway, it's indeed the set of names, as returned by the levels()
function.

Peter Ehlers

>
>
> On Mon, Apr 1, 2013 at 2:58 PM, Bert Gunter <gunter.berton at gene.com
> <mailto:gunter.berton at gene.com>> wrote:
>
>     Yup. Note also:
>
>      > as.character.factor
>     function (x, ...)
>     levels(x)[x]
>
>     But of course this is OK, since this can change if the implementation
>     does. Which is the whole point, of course.
>
>     -- Bert
>
>
>
>     On Mon, Apr 1, 2013 at 12:16 PM, Matthew Lundberg
>     <matthew.k.lundberg at gmail.com <mailto:matthew.k.lundberg at gmail.com>>
>     wrote:
>      >
>      > When used as an index, the factor is implicitly converted to
>     integer.  In
>      > the expression as.numeric(levels(f))[f], the vector
>     as.numeric(levels(f))
>      > is indexed by as.integer(f).
>      >
>      > This appears to rely on the current implementation, as mentioned
>     in section
>      > 2.3.1 of the language definition.
>      >
>      >
>      > On Mon, Apr 1, 2013 at 1:49 PM, Peter Ehlers <ehlers at ucalgary.ca
>     <mailto:ehlers at ucalgary.ca>> wrote:
>      >
>      > > On 2013-04-01 10:48, Matthew Lundberg wrote:
>      > >
>      > >> These two seem to be at odds.  Is this the case?
>      > >>
>      > >>  From help(factor) - section Warning:
>      > >>>
>      > >>
>      > >> To transform a factor f to approximately its original numeric
>     values,
>      > >> as.numeric(levels(f))[f] is recommended and slightly more
>     efficient than
>      > >> as.numeric(as.character(f)).
>      > >>
>      > >>  From the language definition - section 2.3.1:
>      > >>>
>      > >>
>      > >> Factors are currently implemented using an integer array to
>     specify the
>      > >> actual levels and
>      > >> a second array of names that are mapped to the integers. Rather
>      > >> unfortunately users often
>      > >> make use of the implementation in order to make some
>     calculations easier.
>      > >> This, however,
>      > >> is an implementation issue and is not guaranteed to hold in all
>      > >> implementations of R.
>      > >>
>      > >
>      > > Hint:
>      > >
>      > >  f <- factor(sample(5, 10, TRUE))
>      > >  as.numeric(levels(f))[f]
>      > >
>      > >  g <- factor(sample(letters[1:5], 10, TRUE))
>      > >  as.numeric(levels(g))[g]
>      > >
>      > > Peter Ehlers
>      > >
>      > >
>      > >
>      > >>         [[alternative HTML version deleted]]
>      > >>
>      > >> ______________________________**________________
>      > >> R-help at r-project.org <mailto:R-help at r-project.org> mailing list
>      > >>
>     https://stat.ethz.ch/mailman/**listinfo/r-help<https://stat.ethz.ch/mailman/listinfo/r-help>
>      > >> PLEASE do read the posting guide http://www.R-project.org/**
>      > >> posting-guide.html <http://www.R-project.org/posting-guide.html>
>      > >> and provide commented, minimal, self-contained, reproducible code.
>      > >>
>      > >>
>      > >
>      >
>      >         [[alternative HTML version deleted]]
>      >
>      > ______________________________________________
>      > R-help at r-project.org <mailto:R-help at r-project.org> mailing list
>      > https://stat.ethz.ch/mailman/listinfo/r-help
>      > PLEASE do read the posting guide
>     http://www.R-project.org/posting-guide.html
>      > and provide commented, minimal, self-contained, reproducible code.
>
>
>
>
>     --
>
>     Bert Gunter
>     Genentech Nonclinical Biostatistics
>
>     Internal Contact Info:
>     Phone: 467-7374
>     Website:
>     http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm
>
>



More information about the R-help mailing list