[R] lm coefficients output confusing
Daniel Malter
daniel at umd.edu
Thu Aug 13 23:56:40 CEST 2009
I have answered a similar question just hours ago. Your question either
indicates an unfamiliarity with R or, more generally, an unfamiliarity with
regression analysis, or both.
Your anova indicates that the overall model is significant. That is, your
model is an improvement over the null model with just an intercept. Thus,
the anova output tells you that there are significant differences between
the HR groups. This is also indicated by the F-test and p-value for the
F-test in the second model output (the summary.lm).
The question why R does not show HR2 or not just only HR indicates the lack
of understanding of R or regression. The fact that there are seven
coefficient estimates for HR3 through HR8 indicates that the HR variable is
coded as a factor. Consequently, R estimates the model using dummy variables
for each level of HR, except for the baseline, which is coded zero (this is
basically an analysis of variance, not of covariance). By standard, the
baseline is the smallest value (or lowest in alphabetical order) of a
factor-coded variable, which is HR=2 in your case. Since it is the baseline,
there is no estimate for a coefficient for HR2. All other estimated
coefficients compare the effect of HR=3 through HR8 relative to the group in
which HR=2.
If you want to include HR as a numeric variable in the regression (=
analysis of covariance), you have to code it as.numeric. Note, however, that
your results for the factor-coded HR variable indicate that the trend (if
any) is not linear.
For a comparison of all groups against one another from an analysis of
variance, I think there other methods, like the Bonferroni-Dunn test (as a
post-hoc test).
Best,
Daniel
-------------------------
cuncta stricte discussurus
-------------------------
-----Ursprüngliche Nachricht-----
Von: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] Im
Auftrag von Ross Culloch
Gesendet: Thursday, August 13, 2009 4:46 PM
An: r-help at r-project.org
Betreff: [R] lm coefficients output confusing
Hi all,
I have an issue with the lm() function regarding the listing of the
coefficients. My data are below, showing a list of hours (HR) relating to
the time spent resting (R) by an individual animal. Simply i want to run a
lm() to run in an anova() to see if there is a significant difference in
resting between hours.
HR R
1 2 0.6666667
2 2 0.4666667
3 2 0.8000000
4 2 0.6333333
5 2 0.7333333
6 2 0.8000000
7 2 0.8666667
8 2 0.7857143
9 2 0.7826087
10 2 0.6666667
11 2 0.9166667
12 2 0.6666667
13 3 0.5294118
14 3 0.8541667
15 3 0.4583333
16 3 0.5882353
17 3 0.9347826
18 3 0.7878788
19 3 0.7857143
20 3 0.6944444
21 3 0.8333333
22 3 0.7450980
23 3 0.9230769
24 3 0.7222222
25 4 0.6571429
26 4 0.7241379
27 4 0.7391304
28 4 0.6571429
29 4 0.8000000
30 4 0.9130435
31 4 0.7187500
32 4 0.8437500
33 4 0.9230769
34 4 0.8571429
35 4 0.8695652
36 4 0.8888889
37 5 0.3333333
38 5 0.5365854
39 5 0.6774194
40 5 0.7142857
41 5 0.6904762
42 5 0.5483871
43 5 0.5952381
44 5 0.4166667
45 5 0.5666667
46 5 0.5952381
47 5 0.7894737
48 5 0.7500000
49 6 0.6268657
50 6 0.7187500
51 6 0.5500000
52 6 0.7164179
53 6 0.7656250
54 6 0.5869565
55 6 0.7164179
56 6 0.7031250
57 6 0.7230769
58 6 0.7462687
59 6 0.9200000
60 6 0.8536585
61 7 0.6379310
62 7 0.5357143
63 7 0.5227273
64 7 0.8000000
65 7 0.6724138
66 7 0.7083333
67 7 0.7241379
68 7 0.6938776
69 7 0.6545455
70 7 0.7931034
71 7 0.7560976
72 7 0.8684211
73 8 0.6727273
74 8 0.6000000
75 8 0.8333333
76 8 0.8181818
77 8 0.7818182
78 8 0.7647059
79 8 0.5818182
80 8 0.5918367
81 8 0.7450980
82 8 0.7818182
83 8 0.8048780
84 8 0.8684211
The script i'm using and output is as follows:
> anova(rdayml <- lm(R ~ HR, data=rdata2, na.action=na.exclude))
Analysis of Variance Table
Response: R
Df Sum Sq Mean Sq F value Pr(>F)
HR 6 0.25992 0.04332 3.1762 0.00774 **
Residuals 77 1.05021 0.01364
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>
> summary(rdayml <- lm(R ~ HR,data=rdata2))
Call:
lm(formula = R ~ HR, data = rdata2)
Residuals:
Min 1Q Median 3Q Max
-0.279725 -0.065416 0.005593 0.077486 0.201070
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.732082 0.033713 21.715 <2e-16 ***
HR3 0.005976 0.047678 0.125 0.9006
HR4 0.067232 0.047678 1.410 0.1625
HR5 -0.130935 0.047678 -2.746 0.0075 **
HR6 -0.013152 0.047678 -0.276 0.7834
HR7 -0.034807 0.047678 -0.730 0.4676
HR8 0.004971 0.047678 0.104 0.9172
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.1168 on 77 degrees of freedom
Multiple R-squared: 0.1984, Adjusted R-squared: 0.1359
F-statistic: 3.176 on 6 and 77 DF, p-value: 0.00774
What i really don't understand is why the lm summary lists the hour numbers
in the coefficient of the lm, as apposed to just reading HR? On top of that
if R does display the data like this then i don't understand why it omits
hour 2? If i can get this to work correctly can I use the p value to
determine which of the hours is significantly different to the others - so
in this example hour 5 is significantly different? Or is it just a case of
using the p value from the anova to determine that there is a significant
difference between hours (in this case) and use a plot to determine which
hour(s) are likely to be the cause?
Any help or advice would be most useful!
Best wishes,
Ross
--
View this message in context:
http://www.nabble.com/lm-coefficients-output-confusing-tp24958398p24958398.h
tml
Sent from the R help mailing list archive at Nabble.com.
______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list