[R] R: error using pvcm() on unbalanced panel data
Millo Giovanni
Giovanni_Millo at Generali.com
Fri Feb 26 11:24:39 CET 2010
Dear Liviu,
in general, pvcm is capable of fitting variable coefficients models on unbalanced data sets: e.g.,
> data(Grunfeld)
> grun<-Grunfeld[-1,] ## 'unbalance' it
> pvcm(inv~value+capital, data=grun)
Model Formula: inv ~ value + capital
Coefficients:
(Intercept) value capital
1 -193.77819 0.1272949 0.3765985
2 -49.19832 0.1748560 0.3896419
3 -9.95631 0.0265512 0.1516939
4 -6.18996 0.0779478 0.3157182
5 22.70712 0.1623777 0.0031017
6 -8.68554 0.1314548 0.0853743
7 -4.49953 0.0875272 0.1237814
8 -0.50939 0.0528941 0.0924065
9 -7.72284 0.0753879 0.0821036
10 0.16152 0.0045734 0.4373692
> pvcm(inv~value+capital, data=grun, model="random")
Model Formula: inv ~ value + capital
Coefficients:
(Intercept) value capital
-11.741754 0.085046 0.199632
so the problem must be within the dataset and the minimum-T requirements for fitting vc models. The error message may not be the friendliest, but it actually tells you where the problem is.
Let us look at the Hedonic data you used as an example: this is not a panel dataset but you can treat it as one, as you did, by grouping obs. for the same town. So your N is
> length(unique(Hedonic$townid))
[1] 92
and your "T" (although here it is not "time") is:
> summary(tapply(Hedonic$townid,Hedonic$townid,length))
Min. 1st Qu. Median Mean 3rd Qu. Max.
1.0 2.0 4.0 5.5 7.0 30.0
as you can see, there are many "single observation" towns
> which(tapply(Hedonic$townid,Hedonic$townid,length)==1)
1 10 11 12 13 15 34 45 50 51 52 53 65 66 69 70 73
1 10 11 12 13 15 34 45 50 51 52 53 65 66 69 70 73
So the "within" model cannot work, as it needs T>(K+1) for estimating the separate regressions for each town. Why the "random" vcm doesn't is less straightforward.
Let us try a "reduced" pvcm with K=9 on a subset of data with T>10:
> hedo<-Hedonic[(Hedonic$townid %in% which(tapply(Hedonic$townid,Hedonic$townid,length)>10)),]
> dim(hedo)
[1] 208 15
> Hed <- pvcm(mv ~ crim + zn + indus + chas + nox + rm + age + dis +rad
+ , data=hedo, model = "within",index = "townid")
This works, but gives bad coefficients, because as it turns out there are also many time-invariant variables in the dataset, and of course these are discarded in doing timewise regressions!
A closer look at the data reveals that 'chas' is a factor, 'zn' is either 0 or 20, 'age' is truncated at 100 and so on. Let's see what's in the town with the most obs.:
> tapply(hedo$townid,hedo$townid,length)
5 25 28 29 39 41 46 60 80 81 83 84 85
22 11 15 30 11 18 12 12 11 13 19 23 11
> summary(hedo[hedo$townid==29,])
mv crim zn indus chas
Min. : 9.376 Min. :1.127 Min. :0 Min. :19.58 no :23
1st Qu.: 9.655 1st Qu.:1.472 1st Qu.:0 1st Qu.:19.58 yes: 7
Median : 9.878 Median :2.152 Median :0 Median :19.58
Mean : 9.973 Mean :2.111 Mean :0 Mean :19.58
3rd Qu.:10.093 3rd Qu.:2.430 3rd Qu.:0 3rd Qu.:19.58
Max. :10.820 Max. :4.097 Max. :0 Max. :19.58
nox rm age dis
Min. :36.60 Min. :24.04 Min. : 79.20 Min. :0.2788
1st Qu.:36.60 1st Qu.:30.26 1st Qu.: 93.82 1st Qu.:0.4349
Median :75.86 Median :35.69 Median : 96.05 Median :0.5615
Mean :57.54 Mean :37.83 Mean : 95.16 Mean :0.5876
3rd Qu.:75.86 3rd Qu.:39.71 3rd Qu.: 98.42 3rd Qu.:0.7354
Max. :75.86 Max. :70.14 Max. :100.00 Max. :0.8862
rad tax ptratio blacks
Min. :1.609 Min. :403 Min. :14.7 Min. :0.08801
1st Qu.:1.609 1st Qu.:403 1st Qu.:14.7 1st Qu.:0.29349
Median :1.609 Median :403 Median :14.7 Median :0.35000
Mean :1.609 Mean :403 Mean :14.7 Mean :0.31745
3rd Qu.:1.609 3rd Qu.:403 3rd Qu.:14.7 3rd Qu.:0.37402
Max. :1.609 Max. :403 Max. :14.7 Max. :0.39690
lstat townid
Min. :-4.058 Min. :29
1st Qu.:-2.534 1st Qu.:29
Median :-2.064 Median :29
Mean :-2.186 Mean :29
3rd Qu.:-1.801 3rd Qu.:29
Max. :-1.220 Max. :29
whence we see that 'zn', 'indus', 'rad', 'tax', 'ptratio' and, of course, 'townid' are T-invariant, 'chas' is a factor but at least it varies between yes and no. 'nox' is also problematic in that it varies only from time to time...
A feasible formula, whatever this model means, is:
> fm <- mv ~ crim + rm + age + dis + blacks + lstat
> newmodr <- pvcm(fm, data=hedo, model="random", index="townid")
> newmodw <- pvcm(fm, data=hedo, model="within", index="townid")
> ## all is well now
I hope hereby to have given you some methodological hint for a critical overview of your data. PS the pooltest() problem is much the same, as pooltest() needs to fit separate regressions.
Best,
Giovanni
-----Messaggio originale-----
Da: Liviu Andronic [mailto:landronimirc at gmail.com]
Inviato: giovedì 25 febbraio 2010 16:24
A: r-help at r-project.org Help
Cc: Millo Giovanni; yves.croissant at let.ish-lyon.cnrs.fr
Oggetto: error using pvcm() on unbalanced panel data
Dear all
I am trying to fit Variable Coefficients Models on Unbalanced Panel Data. I managed to fit such models on balanced panel data (the example from the "plm" vignette), but I failed to do so on my real, unbalanced panel data.
I can reproduce the error on a modified example from the vignette:
> require(plm)
> data("Hedonic")
> Hed <- pvcm(mv ~ crim + zn + indus + chas + nox + rm + age + dis +rad
> + tax + ptratio + blacks + lstat, Hedonic, model = "within",index =
> "townid")
Error in FUN(X[[1L]], ...) : insufficient number of observations
> ##it fails for both FE and RE cases
> Hed <- pvcm(mv ~ crim + zn + indus + chas + nox + rm + age + dis +rad
> + tax + ptratio + blacks + lstat, Hedonic, model = "random",index =
> "townid")
Error in FUN(X[[1L]], ...) : insufficient number of observations
Would this be expected behaviour for unbalanced data? The vignette warns of several limitations regarding such data, but doesn't mention this specific case as a limitation. I would like to subsequently perform a test of poolability on my real data.
Thank you
Liviu
> sessionInfo()
R version 2.10.1 (2009-12-14)
x86_64-pc-linux-gnu
locale:
[1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8
[5] LC_MONETARY=C LC_MESSAGES=en_GB.UTF-8
[7] LC_PAPER=en_GB.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] splines stats graphics grDevices utils datasets methods
[8] base
other attached packages:
[1] plm_1.2-3 sandwich_2.2-5 zoo_1.6-2 MASS_7.3-5
[5] Formula_0.2-0 kinship_1.1.0-23 lattice_0.18-3 nlme_3.1-96
[9] survival_2.35-8 fortunes_1.3-7 RGtk2_2.12.18 cairoDevice_2.10
[13] sos_1.2-5 brew_1.0-3 hints_1.0.1-1
loaded via a namespace (and not attached):
[1] grid_2.10.1 tools_2.10.1
Ai sensi del D.Lgs. 196/2003 si precisa che le informazi...{{dropped:13}}
More information about the R-help
mailing list