[R] PCAgrid scores cannot be predicted correctly

Manuel Weinkauf Manuel.Weinkauf at gmx.de
Mon Jul 6 17:52:38 CEST 2015


I have been running into a problem with the PCAgrid() function of the 
R-package pcaPP. When trying to predict the scores of new data using the 
resulting PCA, the predicted score values are wrong. This is true for 
both, using the predict() function and for calculating the scores 
manually after scaling the data.

The example below illustrates that: In princomp(), when I either use 
predict() or the manual calculation of the scores of the data originally 
used for the PCA, the predicted points coincide exactly with the PCA 
scores (as it should be, as they are using the same raw data). However, 
when doing this with the princomp-object calculated with the pcaPP 
package, the predicted values are nowhere close to where they should be. 
Note that the results of predict() exactly match the results calculated 
manually, so it is not that predict() could not handle this object 
correctly.

Can anybody explain this weird behaviour to me? Thanks a lot.


##EXAMPLE##
library(pcaPP)

#Create data
t1<-rnorm(10, 10, 1)
t2<-rnorm(10, 8, 2)
t3<-rnorm(10, 60, 4)
t4<-rnorm(10, 1, 0.05)
Dat<-matrix(c(t1, t2, t3 , t4), ncol=4)
colnames(Dat)<-paste("Var", 1:4, sep=".")

win.graph(20, 10, 10)
layout(matrix(c(1, 2), 1, 2))
#Normal PCA
PCA<-princomp(Dat)
PCA.pred<-predict(PCA, Dat)
Dat.Scale<-scale(Dat, center=PCA$center, scale=PCA$scale)
Load<-PCA$loadings
PCA.man<-matrix(NA, nrow(PCA$scores), 2)
for (i in 1:nrow(PCA.man)) {
PCA.man[i,1]<-Dat.Scale[i,1]*Load[1,1]+Dat.Scale[i,2]*Load[2,1]+Dat.Scale[i,3]*Load[3,1]+Dat.Scale[i,4]*Load[4,1]
PCA.man[i,2]<-Dat.Scale[i,1]*Load[1,2]+Dat.Scale[i,2]*Load[2,2]+Dat.Scale[i,3]*Load[3,2]+Dat.Scale[i,4]*Load[4,2]
}
plot(PCA$scores[,1:2], pch=1, cex=2, main="Normal PCA")
points(PCA.pred, pch=16)
points(PCA.man, pch=0, col="red", cex=3)
legend("topleft", pch=c(1, 16, 0), col=c("black", "black", "red"), 
legend=c("Ordination", "Prediction", "Manual prediction"))

#robust PCA
rPCA<-PCAgrid(Dat, k=4, center=median, method="qn", scale=mad)
rPCA.pred<-predict(rPCA, Dat)
Dat.Scale<-scale(Dat, center=rPCA$center, scale=rPCA$scale)
Load<-rPCA$loadings
rPCA.man<-matrix(NA, nrow(rPCA$scores), 2)
for (i in 1:nrow(rPCA.man)) {
rPCA.man[i,1]<-Dat.Scale[i,1]*Load[1,1]+Dat.Scale[i,2]*Load[2,1]+Dat.Scale[i,3]*Load[3,1]+Dat.Scale[i,4]*Load[4,1]
rPCA.man[i,2]<-Dat.Scale[i,1]*Load[1,2]+Dat.Scale[i,2]*Load[2,2]+Dat.Scale[i,3]*Load[3,2]+Dat.Scale[i,4]*Load[4,2]
}
plot(rPCA$scores[,1:2], pch=1, cex=2, main="Robust PCA")
points(rPCA.pred, pch=16)
points(rPCA.man, pch=0, col="red", cex=3)
legend("topleft", pch=c(1, 16, 0), col=c("black", "black", "red"), 
legend=c("Ordination", "Prediction", "Manual prediction"))



-- 
Dr Manuel Weinkauf
MARUM Bremen
Room MARUM II—2050
Leobener Straße
28359 Bremen
Germany
e-mail: mweinkauf at marum.de



More information about the R-help mailing list