[R] Error occurred during mean calculation of a column of a data frame, which is apparently contents numeric data
Prof Brian Ripley
ripley at stats.ox.ac.uk
Wed Feb 29 15:08:25 CET 2012
On 29/02/2012 13:41, Duncan Murdoch wrote:
> On 12-02-29 8:16 AM, R. Michael Weylandt wrote:
>> Factors are internally stored as integers (enums if you have used
>> other programming languages) with a special label set -- it's more
>> memory efficient than storing the whole string over and over.
>
> That was one of the original justifications, but character vectors are
> just as memory efficient these days.
No, not really. Character vectors (STRSXPs) store a pointer for each
string entry, and factors store an integer. On most current systems
pointers are twice the size of integers, so on a 64-bit system
> a <- rep(letters[1:10], each = 1000)
> object.size(a)
80520 bytes
> object.size(as.factor(a))
41008 bytes
> The other justifications are still valid: sometimes you have a vector
> which only takes on a subset of the possible values it could take, and
> when you tabulate it, you'd like to see those zero counts. You may also
> want to control the display order, and a factor allows that.
>
> For example:
>
> x <- c("a", "a", "b")
> table(x)
> x <- factor(x, levels=c("c", "b", "a"))
> table(x)
>
> Duncan Murdoch
--
Brian D. Ripley, ripley at stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595
More information about the R-help
mailing list