[R] Multiple imputation and PCA

Peter Ho peter at esb.ucp.pt
Wed Aug 25 16:47:53 CEST 1999

Dear all,

I have a question regarding the use of multiple imputation folowed by doing a PCA
on imputed datasets. As I have a datset with missing values, applying a pca
directly would mean possible deleting some of the variables. Multiple imputation
was done on the original dataset which then calculated the missing values. I used
first the EM algorithm followed by DA algorithm using the program NORM by schafer.
I chose to genrate 5 imputed datasets. Each dataset has identical values, except
for the imputed missing values. The use of 5 datasets was to allow missing values
uncertainty to be considered.
Now the question.
What would be the correct procedure to do a PCA or any other analysis, to produce
estimates and their standard errors. I have followed the suggestion of schafer and
decided to do individual PCAs on all 5 datasets. This generates a set of loadings
and scores for each one. The idea would then be to combine the loadings and scores
to have an average estimate of both loadings and scores anf their standard errors.
A quick look at the summary and the scree plot of all 5 analyses show that they are
generally all the same. Of course the result is different loadings and scores.
Is this the correct why to analyse multiple imputated datsets for PCA? Or should
data from all 5 datsets be averaged into one datset and only one PCA analysis

Any comments regarding this would be helpful.
By the way, I used princomp. I did not manage to get any scores data using prcomp.


Peter Ho
Escola Superior de Biotecnologia
Universidade Católica Portuguesa
Rua Dr. António Bernardino de Almeida
4200 Porto
Tel: ++351-2-5580043

r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch

More information about the R-help mailing list