[R] Is there a good package for multiple imputation of missing values in R?
Frank E Harrell Jr
f.harrell at vanderbilt.edu
Mon Jun 30 20:25:13 CEST 2008
Robert A LaBudde wrote:
> At 03:02 AM 6/30/2008, Robert A. LaBudde wrote:
>> I'm looking for a package that has a start-of-the-art method of
>> imputation of missing values in a data frame with both continuous and
>> factor columns.
>>
>> I've found transcan() in 'Hmisc', which appears to be possibly suited
>> to my needs, but I haven't been able to figure out how to get a new
>> data frame with the imputed values replaced (I don't have Herrell's
>> book).
>>
>> Any pointers would be appreciated.
>
> Thanks to "paulandpen", Frank and Shige for suggestions.
>
> I looked at the packages 'Hmisc', 'mice', 'Amelia' and 'norm'.
>
> I still haven't mastered the methodology for using aregImpute() in
> 'Hmisc' based on the help information. I think I'll have to get hold of
> Frank's book to see how it's used in a complete example.
It's not in the book; it will be in the 2nd edition someday
Frank
>
> 'Amelia' and 'norm' appear to be focused solely on continuous,
> multivariate normal variables, but my needs typically involve datasets
> with both factors and continuous variables.
>
> The function mice() in 'mice' appears to best suit my needs, and the
> help file was intelligible, and it works on both factors and continuous
> variables.
>
> For those in the audience with similar issues, here is a code snippet
> showing how some of these functions work ('felon' is a data frame with
> categorical and continuous predictors of the binary variable 'hired'):
>
> library('mice') #missing data imputation library for md.pattern(),
> mice(), complete()
> names(felon) #show variable names
> md.pattern(felon[,1:4]) #show patterns for missing data in 1st 4 vars
>
> library('Hmisc') #package for na.pattern() and impute()
> na.pattern(felon[,1:4]) #show patterns for missing data in 1st 4 vars
>
> #simple imputation can be done by
> felon2<- felon #make copy
> felon2$felony<- impute(felon2$felony) #impute NAs (most frequent)
> felon2$gender<- impute(felon2$gender) #impute NAs
> felon2$natamer<- impute(felon2$natamer) #impute NAs
> na.pattern(felon2[,1:4]) #show no NAs left in these vars
> fit2<- glm(hired ~ felony + gender + natamer, data=felon2, family=binomial)
> summary(fit2)
>
> #better, multiple imputation can be done via mice():
> imp<- mice(felon[,1:4]) #do multiple imputation (default is 5 realizations)
> for (iSet in 1:5) { #show results for the 5 imputation datasets
> fit<- glm(hired ~ felony + gender + natamer,
> data=complete(imp, iSet), family=binomial) #fit to iSet-th realization
> print(summary(fit))
> }
>
> ================================================================
> Robert A. LaBudde, PhD, PAS, Dpl. ACAFS e-mail: ral at lcfltd.com
> Least Cost Formulations, Ltd. URL: http://lcfltd.com/
> 824 Timberlake Drive Tel: 757-467-0954
> Virginia Beach, VA 23464-3239 Fax: 757-467-2947
>
> "Vere scire est per causas scire"
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
--
Frank E Harrell Jr Professor and Chair School of Medicine
Department of Biostatistics Vanderbilt University
More information about the R-help
mailing list