[R] prcomp compared to SPAD

Prof Brian Ripley ripley at stats.ox.ac.uk
Tue Oct 3 13:29:28 CEST 2000


> Date: Tue, 03 Oct 2000 11:09:06 +0000
> From: Christine Serres <serres at valigen.net>
> 
> 
> I've used the example given in the documentation for the prcomp function
> both in R and SPAD to compare the results obtained.
> Surprisingly, I do not obtain the same results for the coordinates of
> the principal composantes with these two softwares.
> 
> 
> using USArrests data  I obtain with R :
> 
> > summary(prcomp(USArrests))
> Importance of components:
>                           PC1     PC2    PC3     PC4
> Standard deviation     83.732 14.2124 6.4894 2.48279
> Proportion of Variance  0.966  0.0278 0.0058 0.00085
> Cumulative Proportion   0.966  0.9933 0.9991 1.00000

Read on:

> summary(prcomp(USArrests, scale=T))
Importance of components:
                        PC1   PC2    PC3    PC4
Standard deviation     1.57 0.995 0.5971 0.4164
Proportion of Variance 0.62 0.247 0.0891 0.0434
Cumulative Proportion  0.62 0.868 0.9566 1.0000

> And using SPAD (french editor CISIA) :
> 
> Ex:           sd        pv        cp
> comp1   |   2.4802   |   62.01  |   62.01  |
> comp2   |   0.9898   |   24.74  |   86.75  |
> comp3   |   0.3566   |    8.91  |   95.66  |
> comp4   |   0.1734   |    4.34  |  100.00  |

Also

> summary(princomp(USArrests, cor=T))
Importance of components:
                          Comp.1    Comp.2    Comp.3     Comp.4
Standard deviation     1.5748783 0.9948694 0.5971291 0.41644938
Proportion of Variance 0.6200604 0.2474413 0.0891408 0.04335752
Cumulative Proportion  0.6200604 0.8675017 0.9566425 1.00000000

BTW, it looks like SPAD's `sd' are in fact variances, for the square
of the first line here is
   Comp.1    Comp.2    Comp.3    Comp.4 
2.4802416 0.9897652 0.3565632 0.1734301 


> Am I wrong using R  ? Why the results are so different ?

In this dataset you do want scaling, as the variables are not on a
common scale.  But SPAD has apparently scaled by default, and
apparently mis-labelled its results.

> Furthemore could anyone explain me the difference between prcomp and
> princomp, since we do not obtain exxactly the same results using these
> two functions.

They differ in the definition of variance. It's on the help page for princomp!
If you scale, there is no difference, otherwise there is an n vs n-1
factor.  The reasons are both S-PLUS compatibility and to allow
princomp to use robust principal components.

> And how to obtain the coordinates of the points on the first composante
> using R ?

predict on a princmp fit, or retx=TRUE on a prcomp fit.

You will find all this in Venables & Ripley, for example.

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272860 (secr)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._



More information about the R-help mailing list