[R] multiple imputation with mix package
Kurt Smith
smith.kurt.a at gmail.com
Thu Oct 22 01:57:13 CEST 2009
I am running into a problem using 'mix' for multiple imputation (over
continuous and categorical variables).
For the way I will be using this I would like to create an imputation
model on some training data set and then use this model to impute
missing values for a different set of individuals (i.e. I need to have
a model in place before I receive their information).
I expected that all that model information would be stored in a single
object. In other words, when I run:
imp.mix(s, theta, x)
I expected that theta completely specifies the model and s (created by
prelim.mix) and x only contain info on incomplete the data set I want
to impute.
As best as I can tell this is not the case. For instance, I create a
'model object' (a general location parameter list) from the data set
trainSet using em.mix as follows:
sTrain <- prelim.mix(trainSet,nCategorical)
thetaTrain <- em.mix(sTrain,maxits=1000,showits=TRUE,eps=0.0001)
I then attempt to use this model to impute a missing field (TC) in the
data set workSet as follows:
workSet$TC <- NA
sWork <- prelim.mix(workSet,nCategorical)
imputedWork <- imp.mix(sWork,thetaTrain,workSet)
This does not give realistic values for TC (they are around 0). My
guess is that the part of the imputation model information is stored
in sWork
If I do this it looks like it works better (values of TC in the correct range):
sWork$xbar = sTrain$xbar
sWork$sdv = sTrain$sdv
imputedWork <- imp.mix(sWork,thetaTrain,workSet)
Can someone say whether what I am doing is correct? Is changing xbar
and sdv by hand a proper solution? I'm not sure whether that could
mess other things up.
Thanks,
Kurt
More information about the R-help
mailing list