[R] stringsAsFactor global option (was "character coerced to a factor")
Prof Brian Ripley
ripley at stats.ox.ac.uk
Mon Apr 23 15:48:03 CEST 2007
On Mon, 23 Apr 2007, Terry Therneau wrote:
> --- Gabor Grothendieck <ggrothendieck at gmail.com>
> wrote:
>
>> Just one caveat. I personally would try to avoid
>> using
>> global options since it can cause conflicts when
>> two different programs assume two different settings
>> of the same global option and need to interact.
>
> I see this argument often, and don't buy it. In any case, for this
> particular option, the Mayo biostatistics group (~120 users) has had
> stringsAsFactors=F as a global default for 15+ years now with no ill effects.
> It is much less confusing for both new and old users.
>
> Johh Kane asked "Any idea what the rationale was for setting the
> option to TRUE?" When factors were first introduced, there was no option
> to turn them off. Reading between the lines of the white book (Statistical
> Models in S) that introduced them, this is my guess: they made perfect sense for
> the particular data sets that were being analysed by the authors at the time.
> Many of the defaults in the survival package, which I wrote, have exactly the
> same rationale --- so let us not be too harsh on an author for not forseeing
> all the future consequences of a default!
>
> A place where factors really are a pain is when the patient id is a character
> string. When, for instance, you subset the data to do an analysis of only
> the females, having the data set `remember' all of the male id's (the original
> levels) is non-productive in dozens of ways. For other variables factors
> work well and have some nice properties. In general, I've found in my work
> (medical research) that factors are beneficial for about 1/5 of the character
> variables, a PITA for 1/4, and a wash for the rest; so prefer to do any
> transformations myself.
>
> For the historically curious:
> In Splus, one originally fixed this with an override of the function
> as.data.frame.character <- as.data.frame.vector
> before they added the global option. In R, unfortunately, this override
> didn't work due to namespaces, and we had to wait for the option to be
> added. (Another dammed-if-you-do dammed-if-you-don't issue. Normally you
> don't want users to be able to override a base function, because 9 times out
> of 10 they did it by accident and dont' want it either. But when a user really
> does want to do so ...)
That is what 'assignInNamespace' is for (and it came in with namespaces).
--
Brian D. Ripley, ripley at stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595
More information about the R-help
mailing list