[R] plm "within" models: is the correct F-statistic reported?

Wed Mar 17 00:39:06 CET 2010

> Dear R users
> I get different F-statistic results for a "within" model, when using
> "time" or "twoways" effects in plm() [1] and when manually specifying
> the time control dummies [2].
> [1] vignette("plm")
> [2] http://cran.r-project.org/doc/contrib/Farnsworth-EconometricsInR.pdf

Well, the question is incomplete in a way. An F-statistic is always 
associated with testing a model against some restricted version of that 
model. And which restricted model is reasonable might vary with your 
application.

You used:

data("Grunfeld", package = "AER")
library("plm")
gr <- subset(Grunfeld, firm %in% c("General Electric", "General Motors", "IBM"))
pgr <- plm.data(gr, index = c("firm", "year"))

and then considered

gr_fe <- plm(invest ~ value + capital, data = pgr, model = "within",
   effect = "individual")

which you correctly pointed out is equivalent to

gr_lm <- lm(invest ~ 0 + value + capital + firm, data = pgr)

The difference between the two is that in "gr_fe" the model knows that the 
parameters of interest are "value" and "capital" and that the 
firm-specific intercepts are nuisance parameters (or at least of less 
importance than value/capital).

In "gr_lm" however, the fitted model does not know about that. It just 
knows that you forced out the intercept (and doesn't check that a 
firm-specific intercept is in fact included).

Hence, when saying summary() different models with "no effects" are 
assumed. For gr_fe the model without effects just omits value/capital but 
keeps the firm-specific interecepts. For gr_lm not even the intercept is 
kept in the model. Thus:

gr_fe_null <- lm(invest ~ 0 + firm, data = pgr)
gr_lm_null <- lm(invest ~ 0, data = pgr)

Then, comparing the full model (gr_lm) against the different null models 
yields:

R> anova(gr_fe_null, gr_lm)
Analysis of Variance Table

Model 1: invest ~ 0 + firm
Model 2: invest ~ 0 + value + capital + firm
   Res.Df     RSS Df Sum of Sq      F    Pr(>F)
1     57 1888946
2     55  243985  2   1644961 185.41 < 2.2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
R> anova(gr_lm_null, gr_lm)
Analysis of Variance Table

Model 1: invest ~ 0
Model 2: invest ~ 0 + value + capital + firm
   Res.Df     RSS Df Sum of Sq      F    Pr(>F)
1     60 9553385
2     55  243985  5   9309400 419.71 < 2.2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

> In the first case, plm(..., effect="individual"), F-statistic: 185.407
> and in the second F-statistic:  420, while all other regression
> coefficients and standard errors are the same. Which F-statistic
> should be considered?

It depends what you want to test. But I doubt that the one reported in 
summary(gr_lm) tests a useful hypothesis/alternative.

Best,
Z