[R] Using pam, agnes or clara as prediction models?

Thu Jan 15 09:32:45 CET 2004

On Thu, 15 Jan 2004, Renald Buter wrote:

> On Wed, Jan 14, 2004 at 03:18:10PM -0500, Liaw, Andy wrote:
> > If pam produces the cluster medoids, you should be able to use the
> > 1-nearest-neighbor classifier for prediction of future data, using the
> > medoids as the `training' data.  1-NN is available in the `class' package,
> > part of the `VR' bundle.
> > 
> 
> Thanks very much for your quick answer! I've tried your suggestion in
> the following way:
> 
>  # separate the ruspini data into train and test set
>  > train<-ruspini[1:50,]
>  > test<-ruspini[51:75,]
>  > pamx<-pam(train,4)
>  > knnx<-knn(pamx$medoids,test,factor(c("a","b","c","d")),k=3)
>  > knnx
>  [1] d d b b d c b c c d c a a d c c a a c a a d c d a
>  Levels: a b c d
> 
> But the result of applying the test set to the knn should only contain 2
> clusters, since the upper half of the ruspini data contains only 2
> clusters.
> 
> Could you tell me what I am missing here?

You asked that the upper half be divided into 4 clusters.  Did you look at 
the object pamx?  It contains 4 clusters covering only the first part of 
the dataset.

Given that when you apply pam to the whole dataset there is a cluster that
only occurs for objects 61:75, there is no way you can find that cluster
when no member of it is in your training set.

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595