[R] Multiple imputation, especially in rms/Hmisc packages

Tue Aug 10 05:49:34 CEST 2010

Thank you for your reply Frank. I am not familiar with the contrast 
test, but I'll see what I can find out about it.

Mark

Frank Harrell wrote:
>
> On Mon, 9 Aug 2010, Mark Seeto wrote:
>
>> Hello, I have a general question about combining imputations as well 
>> as a
>> question specific to the rms and Hmisc packages.
>>
>> The situation is multiple regression on a data set where multiple
>> imputation has been used to give M imputed data sets. I know how to get
>> the combined estimate of the covariance matrix of the estimated
>> coefficients (average the M covariance matrices from the individual
>> imputations and add (M+1)/M times the between-imputation covariance
>> matrix), and I know how to use this to get p-values and confidence
>> intervals for the individual coefficients.
>>
>> What I don't know is how to combine imputations to obtain the multiple
>> degree-of-freedom tests to test whether a set of coefficients are all 0.
>> If there were no missing values, this would be done using the "general
>> linear test" which involves fitting the full model and the reduced
>> model and getting an F statistic based on the residual sums of 
>> squares and
>> degrees of freedom. How does one correctly do this when there are 
>> multiple
>> imputations? In the language of Hmisc, I'm asking how the values in
>> anova(fmi) are calculated if fmi is the result of fit.mult.impute (I've
>> looked at the anova.rms code but couldn't understand it).
>
> In ordinary regression, the "leave some variables out and compare 
> R^2"-based F-test and the quicker contrast tests are equivalent.   In 
> general we use the latter, which is expedient since it is easy to 
> estimate the imputation-corrected covariance matrix.  In the rms and 
> Hmisc packages, fit.mult.impute creates the needed fit object, then 
> anova, contrast, summary, etc. give the correct Wald tests.
>
>
>  >
>> The second question I have relates to rms/Hmisc specifically. When I use
>> fit.mult.impute, the fitted values don't appear to match the values 
>> given
>> by the predict function on the original data. Instead, they match the
>> fitted values of the last imputation. For example,
>>
>> library(rms)
>> set.seed(1)
>> x1 <- rnorm(40)
>> x2 <- 0.5*x1 + rnorm(40,0,3)
>>
>> y <- x1^2 + x2 + rnorm(40,0,3)
>> x2[40] <- NA   # create 1 missing value in x2
>> a <- aregImpute(~ y + x1 + x2, n.impute=2, nk=3) # 2 imputations
>>
>> x2.imp1 <- x2; x2.imp1[40] <- a$imputed$x2[,1]  # first imputed x2 
>> vector
>> x2.imp2 <- x2; x2.imp2[40] <- a$imputed$x2[,2]  # second imputed x2 
>> vector
>>
>> ols.imp1 <- ols(y ~ rcs(x1,3) + x2.imp1)  # model on imputation 1
>> ols.imp2 <- ols(y ~ rcs(x1,3) + x2.imp2)  # model on imputation 2
>>
>> d <- data.frame(y, x1, x2)
>> fmi <- fit.mult.impute(y ~ rcs(x1,3) + x2, ols, a, data=d)
>>
>> fmi$fitted
>> predict(fmi, d)
>>
>> I would have expected the results of fmi$fitted and predict(fmi,d) to be
>> the same except for the final NA, but they are not. Instead, 
>> fmi$fitted is
>> the same as ols.imp2$fitted. Also,
>>
>> anova(ols.imp2)
>> anova(fmi)
>>
>> unexpectedly give the same result in the ERROR line. I would be most
>> grateful if anyone could explain this to me.
>
> I need to add some more warnings.  For many of the calculations, the 
> last imputation is used.
>
> Frank
>
>>
>> Thanks,
>> Mark
>> -- 
>> Mark Seeto
>> Statistician
>>
>> National Acoustic Laboratories <http://www.nal.gov.au/>
>> A Division of Australian Hearing
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide 
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>