[R] Multicollinearity, plm, and omitting variables
A. John Woodill
johnwoodill at gmail.com
Thu Mar 19 18:36:31 CET 2015
I'm fitting a fixed effect model with plm and know that I'm dealing with
multi-collinearity between two of the independent variables. I working on
identifying multicolliearity in models as a practice and have identified
the variable with alias(), then verified with vif(). I was also able to use
kappa() to show a very large conditional number verifying the
multicollinearity.
My question is why does plm() omit this multicolliearity variable from the
coefficients? There is no output clarifying why and I couldn't find
anything in the documentation. Stata automatically omits this variable and
I'm curious if plm() does a check and then omits. Does plm() run through
checks when fitting a fixed effect model that checks for collinearity or
any other problems before running the model? Why is dfmfd98 variable being
omitted in the example below?
Stack Exchange Post :
http://stats.stackexchange.com/questions/141684/multicollinearity-plm-and-omitting-variables
Multicollinearity variable dfmfd98
Reproducible example :
dput :
data <- structure(list(lexptot = c(8.28377505197124, 9.1595012302023,
8.14707583238833,
9.86330744180814, 8.21391453619232, 8.92372556833205, 7.77219149815994,
8.58202430280175, 8.34096828565733, 10.1133857229336, 8.56482997492403,
8.09468633074053, 8.27040804817704, 8.69834992618814, 8.03086333985764,
8.89644392254136, 8.20990433577082, 8.82621293136669, 7.79379981225575,
8.16139809188569, 8.25549748271241, 8.57464947213076, 8.2714431846277,
8.72374048671495, 7.98522888221012, 8.56460042433047, 8.22778847721461,
9.15431416391622, 8.25261818916933, 8.88033778695326), year = c(0L, 1L, 0L,
1L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 1L,
0L, 1L, 0L, 1L, 0L, 1L, 0L, 1L), dfmfdyr = c(0, 1, 0, 1, 0, 1, 0, 1, 0, 1,
0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0), dfmfd98 = c(1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0,
0, 0, 0), nh = c(11054L, 11054L, 11061L, 11061L, 11081L, 11081L, 11101L,
11101L, 12021L, 12021L, 12035L, 12035L, 12051L, 12051L, 12054L, 12054L,
12081L, 12081L, 12121L, 12121L, 13014L, 13014L, 13015L, 13015L, 13021L,
13021L, 13025L, 13025L, 13035L, 13035L)), .Names = c("lexptot", "year",
"dfmfdyr", "dfmfd98", "nh"), class = c("tbl_df", "data.frame"), row.names =
c(NA, -30L))
Regression Code :
library(plm)
lm <- plm(lexptot ~ year + dfmfdyr + dfmfd98 + nh, data = data, model =
"within", index = "nh")
summary(lm)
Output :
Oneway (individual) effect Within Model
Call:
plm(formula = lexptot ~ year + dfmfdyr + dfmfd98 + nh, data = data,
model = "within", index = "nh")
Balanced Panel: n=15, T=2, N=30
Residuals :
Min. 1st Qu. Median 3rd Qu. Max.
-4.75e-01 -1.69e-01 4.44e-16 1.69e-01 4.75e-01
Coefficients :
Estimate Std. Error t-value Pr(>|t|)
year 0.47552 0.23830 1.9955 0.06738 .
dfmfdyr 0.34635 0.29185 1.1867 0.25657
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Total Sum of Squares: 5.7882
Residual Sum of Squares: 1.8455
R-Squared : 0.68116
Adj. R-Squared : 0.29517
F-statistic: 13.8864 on 2 and 13 DF, p-value: 0.00059322
[[alternative HTML version deleted]]
More information about the R-help
mailing list