[R] Multicollinearity, plm, and omitting variables

Thu Mar 19 18:36:31 CET 2015

I'm fitting a fixed effect model with plm and know that I'm dealing with
multi-collinearity between two of the independent variables. I working on
identifying multicolliearity in models as a practice and have identified
the variable with alias(), then verified with vif(). I was also able to use
kappa() to show a very large conditional number verifying the
multicollinearity.

My question is why does plm() omit this multicolliearity variable from the
coefficients? There is no output clarifying why and I couldn't find
anything in the documentation. Stata automatically omits this variable and
I'm curious if plm() does a check and then omits.  Does plm() run through
checks when fitting a fixed effect model that checks for collinearity or
any other problems before running the model?  Why is dfmfd98 variable being
omitted in the example below?

Stack Exchange Post :
http://stats.stackexchange.com/questions/141684/multicollinearity-plm-and-omitting-variables

Multicollinearity variable dfmfd98

Reproducible example :

dput :

data <- structure(list(lexptot = c(8.28377505197124, 9.1595012302023,
8.14707583238833,
9.86330744180814, 8.21391453619232, 8.92372556833205, 7.77219149815994,
8.58202430280175, 8.34096828565733, 10.1133857229336, 8.56482997492403,
8.09468633074053, 8.27040804817704, 8.69834992618814, 8.03086333985764,
8.89644392254136, 8.20990433577082, 8.82621293136669, 7.79379981225575,
8.16139809188569, 8.25549748271241, 8.57464947213076, 8.2714431846277,
8.72374048671495, 7.98522888221012, 8.56460042433047, 8.22778847721461,
9.15431416391622, 8.25261818916933, 8.88033778695326), year = c(0L, 1L, 0L,
1L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 1L,
0L, 1L, 0L, 1L, 0L, 1L, 0L, 1L), dfmfdyr = c(0, 1, 0, 1, 0, 1, 0, 1, 0, 1,
0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0), dfmfd98 = c(1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0,
0, 0, 0), nh = c(11054L, 11054L, 11061L, 11061L, 11081L, 11081L, 11101L,
11101L, 12021L, 12021L, 12035L, 12035L, 12051L, 12051L, 12054L, 12054L,
12081L, 12081L, 12121L, 12121L, 13014L, 13014L, 13015L, 13015L, 13021L,
13021L, 13025L, 13025L, 13035L, 13035L)), .Names = c("lexptot", "year",
"dfmfdyr", "dfmfd98", "nh"), class = c("tbl_df", "data.frame"), row.names =
c(NA, -30L))

Regression Code :

library(plm)

lm <- plm(lexptot ~ year + dfmfdyr + dfmfd98 + nh, data = data, model =
"within", index = "nh")

summary(lm)

Output :

Oneway (individual) effect Within Model

Call:

plm(formula = lexptot ~ year + dfmfdyr + dfmfd98 + nh, data = data,

    model = "within", index = "nh")

Balanced Panel: n=15, T=2, N=30

Residuals :

     Min.   1st Qu.    Median   3rd Qu.      Max.

-4.75e-01 -1.69e-01  4.44e-16  1.69e-01  4.75e-01

Coefficients :

        Estimate Std. Error t-value Pr(>|t|)

year     0.47552    0.23830  1.9955  0.06738 .

dfmfdyr  0.34635    0.29185  1.1867  0.25657

---

Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Total Sum of Squares:    5.7882

Residual Sum of Squares: 1.8455

R-Squared      :  0.68116

      Adj. R-Squared :  0.29517

F-statistic: 13.8864 on 2 and 13 DF, p-value: 0.00059322

	[[alternative HTML version deleted]]