[R] Testing equality of regression model on multiple groups
Clara Yuan
CYuan at transitchicago.com
Fri Dec 18 16:28:13 CET 2009
Hi Daniel,
Thanks for your thorough response. You are indeed correct that I was looking for (a), and the Chow test fits the bill exactly. However, I believe my method is equivalent to the Chow test. Rather than summing the errors across regressions on different datasets, I formulate a single specification with errors equivalent to that sum. (Except there would be one difference: the reduced and full models include dummies for the grouping variable, and the Chow test doesn't.) What interfered with the method was the fact that my factor was ordered and therefore was not treated as a dummy variables by R in my model specification. Thanks to David who helped clarify R's behaviour on this point.
Once I unordered my factor, I found that the coefficient estimates were the same.
> data.ex$t = factor(data.ex$t, ordered = F)
> coef(lm.together)
(Intercept) t2 t3 t4 x1 x2
2.691272263 -0.975915716 -0.811480858 -0.763039039 0.107520721 0.054694784
t2:x1 t3:x1 t4:x1 t2:x2 t3:x2 t4:x2
-0.060271900 -0.063503572 -0.087683111 -0.015070147 0.005671774 -0.024606167
x1:x2 t2:x1:x2 t3:x1:x2 t4:x1:x2
0.002180749 -0.001815363 -0.002329575 -0.002921691
You can see that these are equivalent to the estimates from lm.separate. Demonstrating with the intercept:
> sapply(lm.separate, coef)[1,]
1 2 3 4
2.691272 1.715357 1.879791 1.928233
> coef(lm.together)[1] + c(0, coef(lm.together)[2:4])
t2 t3 t4
2.691272 1.715357 1.879791 1.928233
For reference, the method I'm using is described here:
www.stat.wisc.edu/~mchung/teaching/stat324/324.24.pdf
Thanks very much for everyone's help,
Clara
________________________________________
From: Daniel Malter [daniel at umd.edu]
Sent: Friday, December 18, 2009 4:22 AM
To: Clara Yuan; 'r-help at lists.R-project.org'
Subject: RE: [R] Testing equality of regression model on multiple groups
Hi, your question is unclear. It is not clear whether you want to compare
a.) whether two subsets of data have the same coefficients for an identical
model or b.) whether a couple of coefficients are different once you include
additional regressors. The reason for this confusion is that your
lm.separate and lm.together are not the same models (i.e., they do not
include the same regressors). The former is (y~x1*x2); the latter is
(y~t*x1*x2), which is obviously different.
If a.) is your goal, you want to run a Chow test.
For that, you run the regression (y~x1*x2) on the entire dataset and on the
two subsets of the data separately. If the model on the entire data produces
much larger errors (in sum) than the two regressions on the subsets of data
(in sum), then this is evidence that the data-generating processes for the
two subsets differ significantly. In other words, the coefficients for the
subsets will jointly be significantly different from each other. This is, as
you said, an F-Test. Look here for further details:
http://en.wikipedia.org/wiki/Chow_test
It just beckons me that you probably want to compare multiple groups. If you
want to run (y~x1*x2) interacted with the group indicator, i.e.
(y~group*x1*x2), you should code your group indicator as "treatment
contrasts."
The t.L, t.Q, and t.C indicate a.) that t is a factor variable and b.) that
this factor variable is coded as orthogonal polynomial contrasts. This tests
for linear, quadractic, cubic, etc. influence of t on the dependent
variable. This makes sense if t has constant differences between the factor
levels (like 10 cm, 15 cm, 20 cm, 25 cm). Otherwise, you probably want
dummy-variable coding, which is called "treatment contrasts" in R lingo.
HTH,
Daniel
-------------------------
cuncta stricte discussurus
-------------------------
-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On
Behalf Of Clara Yuan
Sent: Thursday, December 17, 2009 5:15 PM
To: r-help at lists.R-project.org
Subject: [R] Testing equality of regression model on multiple groups
Hello,
I'm trying to test for the joint equality of coefficients of the same model
across different subsets of the data (ie, is it necessary to estimate the
same model on these different populations, or can I just estimate the model
once on the whole dataset?).
My plan is to use the F-test on the reduced model and the full model. By
full model, I mean a specification that mimics my regressions on separate
subsets of data, but I have found that the full model's coefficient
estimates don't correspond to my original model's estimates. I was under the
impression that they would be identical.
Original model:
> lm.separate = by(data.ex, data.ex$t, function(x) lm(y ~ x1 * x2, data =
x))
Full model:
> lm.together = lm(y ~ t * x1 * x2, data = data.ex)
The data are grouped by t.
When I examine the coefficients, I find that they are roughly in the same
ballpark, but not nearly identical:
> sapply(lm.separate, coef)
1 2 3 4
(Intercept) 2.691272263 1.7153565472 1.8797914048 1.9282332240
x1 0.107520721 0.0472488208 0.0440171489 0.0198376096
x2 0.054694784 0.0396246366 0.0603665574 0.0300886164
x1:x2 0.002180749 0.0003653858 -0.0001488267 -0.0007409421
> coef(lm.together)
(Intercept) t.L t.Q t.C x1
2.0536633597 -0.4750933962 0.5121787674 -0.2809269719 0.0546560750
x2 t.L:x1 t.Q:x1 t.C:x1 t.L:x2
0.0461936485 -0.0595422428 0.0180461803 -0.0174386682 -0.0118682844
t.Q:x2 t.C:x2 x1:x2 t.L:x1:x2 t.Q:x1:x2
-0.0076038969 -0.0194162097 0.0004140914 -0.0020749112 0.0006116237
t.C:x1:x2
-0.0003083657
(Also, why are the coefficients renamed to t.L, t.Q, etc instead of t.1,
t.2?)
What am I missing?
Thanks for the help,
Clara
______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list