[R] Debugging multiple imputation in mice

Harish Narayanan harish.mlists at gmail.com
Mon Jul 25 21:14:32 CEST 2011


Hello all,

I am trying to impute some missing data using the mice package. The data
set I am working with contains 125 variables (190 observations),
involving both categorical and continuous data. Some of these variables
are missing up to 30% of their data.

I am running into a peculiar problem which is illustrated by the
following example showing both the original data (blue) and the imputed
values (red).

http://home.simula.no/~harish/files/tmp/imputation-error.pdf

As the plot shows, mice seems to favour 2--3 distinct values for each of
the ten imputations. I would imagine that it would be a bit more
distributed. I observe this behaviour for each of the imputed variables
(~80 variables), at least the ones that I looked at.

I have tried both constructing a predictor matrix (to specify
predictors) and not, allowing mice to figure out sensible defaults. I
have also tried upping the number of iterations per imputation hoping
that would help the algorithm (pmm) converge to a different solution,
but that didn't change the imputations either.

Could you please point me as to where to look to debug this behaviour? I
have been going through the recent mice manual[1], but is there
something in particular I should be looking at? I guess a bigger
question is, should I also be experimenting with other packages such as
Amelia and mi?

Thanks,
Harish

[1] http://www.stefvanbuuren.nl/publications/MICE%20in%20R%20-%20Draft.pdf



More information about the R-help mailing list