[R] Multiple imputation, especially in rms/Hmisc packages
Mark Seeto
Mark.Seeto at nal.gov.au
Tue Aug 10 05:49:34 CEST 2010
Thank you for your reply Frank. I am not familiar with the contrast
test, but I'll see what I can find out about it.
Mark
Frank Harrell wrote:
>
> On Mon, 9 Aug 2010, Mark Seeto wrote:
>
>> Hello, I have a general question about combining imputations as well
>> as a
>> question specific to the rms and Hmisc packages.
>>
>> The situation is multiple regression on a data set where multiple
>> imputation has been used to give M imputed data sets. I know how to get
>> the combined estimate of the covariance matrix of the estimated
>> coefficients (average the M covariance matrices from the individual
>> imputations and add (M+1)/M times the between-imputation covariance
>> matrix), and I know how to use this to get p-values and confidence
>> intervals for the individual coefficients.
>>
>> What I don't know is how to combine imputations to obtain the multiple
>> degree-of-freedom tests to test whether a set of coefficients are all 0.
>> If there were no missing values, this would be done using the "general
>> linear test" which involves fitting the full model and the reduced
>> model and getting an F statistic based on the residual sums of
>> squares and
>> degrees of freedom. How does one correctly do this when there are
>> multiple
>> imputations? In the language of Hmisc, I'm asking how the values in
>> anova(fmi) are calculated if fmi is the result of fit.mult.impute (I've
>> looked at the anova.rms code but couldn't understand it).
>
> In ordinary regression, the "leave some variables out and compare
> R^2"-based F-test and the quicker contrast tests are equivalent. In
> general we use the latter, which is expedient since it is easy to
> estimate the imputation-corrected covariance matrix. In the rms and
> Hmisc packages, fit.mult.impute creates the needed fit object, then
> anova, contrast, summary, etc. give the correct Wald tests.
>
>
> >
>> The second question I have relates to rms/Hmisc specifically. When I use
>> fit.mult.impute, the fitted values don't appear to match the values
>> given
>> by the predict function on the original data. Instead, they match the
>> fitted values of the last imputation. For example,
>>
>> library(rms)
>> set.seed(1)
>> x1 <- rnorm(40)
>> x2 <- 0.5*x1 + rnorm(40,0,3)
>>
>> y <- x1^2 + x2 + rnorm(40,0,3)
>> x2[40] <- NA # create 1 missing value in x2
>> a <- aregImpute(~ y + x1 + x2, n.impute=2, nk=3) # 2 imputations
>>
>> x2.imp1 <- x2; x2.imp1[40] <- a$imputed$x2[,1] # first imputed x2
>> vector
>> x2.imp2 <- x2; x2.imp2[40] <- a$imputed$x2[,2] # second imputed x2
>> vector
>>
>> ols.imp1 <- ols(y ~ rcs(x1,3) + x2.imp1) # model on imputation 1
>> ols.imp2 <- ols(y ~ rcs(x1,3) + x2.imp2) # model on imputation 2
>>
>> d <- data.frame(y, x1, x2)
>> fmi <- fit.mult.impute(y ~ rcs(x1,3) + x2, ols, a, data=d)
>>
>> fmi$fitted
>> predict(fmi, d)
>>
>> I would have expected the results of fmi$fitted and predict(fmi,d) to be
>> the same except for the final NA, but they are not. Instead,
>> fmi$fitted is
>> the same as ols.imp2$fitted. Also,
>>
>> anova(ols.imp2)
>> anova(fmi)
>>
>> unexpectedly give the same result in the ERROR line. I would be most
>> grateful if anyone could explain this to me.
>
> I need to add some more warnings. For many of the calculations, the
> last imputation is used.
>
> Frank
>
>>
>> Thanks,
>> Mark
>> --
>> Mark Seeto
>> Statistician
>>
>> National Acoustic Laboratories <http://www.nal.gov.au/>
>> A Division of Australian Hearing
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
More information about the R-help
mailing list