[R] factor with numeric names
Saiwing Yeung
saiwing at berkeley.edu
Wed Mar 25 13:46:22 CET 2009
Thank you so much both for the answer. I think I have a better handle
on this now. Yes, Loblolly$Seed is an ordered factor, but I didn't
realize that the default for ordered factor is contr.poly.
And then I was further confused because I didn't realize the
coefficient names generated (not just the model) are different
depending on whether there is an intercept term (even though they were
both "contr.poly").
> lm(formula = height ~ age + Seed, data = Loblolly)
Call:
lm(formula = height ~ age + Seed, data = Loblolly)
Coefficients:
(Intercept) age Seed.L Seed.Q Seed.C
Seed^4
-1.31240 2.59052 4.86941 0.87307 0.37894
-0.46853
Seed^5 Seed^6 Seed^7 Seed^8 Seed^9
Seed^10
0.55237 0.39659 -0.06507 0.35074 -0.83442
0.42085
Seed^11 Seed^12 Seed^13
0.53906 -0.29803 -0.77254
> lm(formula = height ~ age + Seed - 1, data = Loblolly)
Call:
lm(formula = height ~ age + Seed - 1, data = Loblolly)
Coefficients:
age Seed329 Seed327 Seed325 Seed307 Seed331 Seed311
Seed315 Seed321
2.5905 -3.3635 -3.0701 -1.7535 -2.3485 -2.6568 -2.0235
-1.3168 -2.4651
Seed319 Seed301 Seed323 Seed309 Seed303 Seed305
-0.7951 -0.4301 -0.1235 0.1049 0.4299 1.4382
This should have been obvious to me...
(for the sake of completeness) I think factor() doesn't change the
"ordered-ness"
# as.factor(Loblolly$Seed) doesn't remove the ordered-ness
> str(Loblolly$Seed)
Ord.factor w/ 14 levels "329"<"327"<"325"<..: 10 10 10 10 10 10 13
13 13 13 ...
> str(as.factor(Loblolly$Seed))
Ord.factor w/ 14 levels "329"<"327"<"325"<..: 10 10 10 10 10 10 13
13 13 13 ...
# this works though
> str(factor(Loblolly$Seed, ordered=F))
Factor w/ 14 levels "329","327","325",..: 10 10 10 10 10 10 13 13 13
13 ...
Saiwing
On Mar 21, 2009, at 3:35 PM, John Fox wrote:
> Dear Saiwing Yeung,
>
> You appear to be using orthogonal-polynomial contrasts (generated by
> contr.poly) for Seed, which suggests that Seed is either an ordered
> factor
> or that you've assigned these contrasts to it. Because Seed has 14
> levels,
> you end up fitting an degree-13 polynomial. If Seed is indeed an
> ordered
> factor and you want to use contr.treatment instead then you could,
> e.g., set
> Loblolly$Seed <- as.factor(Loblolly$Seed). (If I'm right about Seed
> being an
> ordered factor, your solution worked because it changed Seed to a
> factor,
> not because it used non-numeric level names.)
>
> I hope this helps,
> John
>
>> -----Original Message-----
>> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org
>> ]
> On
>> Behalf Of Saiwing Yeung
>> Sent: March-21-09 5:02 PM
>> To: r-help at r-project.org
>> Subject: [R] factor with numeric names
>>
>> Hi all,
>>
>> I have a pretty basic question about categorical variables but I
>> can't
>> seem to be able to find answer so I am hoping someone here can
>> help. I
>> found that if the factor names are all in numbers, fitting the model
>> in lm would return labels that are not very recognizable.
>>
>> # Example: let's just assume that we want to fit this model
>> fit <- lm(height ~ age + Seed, data=Loblolly)
>>
>> # See the category names are all mangled up here
>> fit
>>
>>
>> Call:
>> lm(formula = height ~ age + Seed, data = Loblolly)
>>
>> Coefficients:
>> (Intercept) age Seed.L Seed.Q Seed.C
>> Seed^4
>> -1.31240 2.59052 4.86941 0.87307 0.37894
>> -0.46853
>> Seed^5 Seed^6 Seed^7 Seed^8 Seed^9
>> Seed^10
>> 0.55237 0.39659 -0.06507 0.35074 -0.83442
>> 0.42085
>> Seed^11 Seed^12 Seed^13
>> 0.53906 -0.29803 -0.77254
>>
>>
>>
>> One possible solution I found is to rename the categorical variables
>>
>> seed.str <- paste("S", Loblolly$Seed, sep="")
>> seed.str <- factor(seed.str)
>> fit <- lm(height ~ age + seed.str, data=Loblolly)
>> fit
>>
>>
>>
>> Call:
>> lm(formula = height ~ age + seed.str, data = Loblolly)
>>
>> Coefficients:
>> (Intercept) age seed.strS303 seed.strS305 seed.strS307
>> -0.4301 2.5905 0.8600 1.8683 -1.9183
>> seed.strS309 seed.strS311 seed.strS315 seed.strS319 seed.strS321
>> 0.5350 -1.5933 -0.8867 -0.3650 -2.0350
>> seed.strS323 seed.strS325 seed.strS327 seed.strS329 seed.strS331
>> 0.3067 -1.3233 -2.6400 -2.9333 -2.2267
>>
>>
>> Now it is actually possible to see which one is which, but is kind of
>> lame. Can someone point me to a more elegant solution? Thank you so
>> much.
>>
>> Saiwing Yeung
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>
More information about the R-help
mailing list