[R] Using pam, agnes or clara as prediction models?
Prof Brian Ripley
ripley at stats.ox.ac.uk
Thu Jan 15 09:32:45 CET 2004
On Thu, 15 Jan 2004, Renald Buter wrote:
> On Wed, Jan 14, 2004 at 03:18:10PM -0500, Liaw, Andy wrote:
> > If pam produces the cluster medoids, you should be able to use the
> > 1-nearest-neighbor classifier for prediction of future data, using the
> > medoids as the `training' data. 1-NN is available in the `class' package,
> > part of the `VR' bundle.
> >
>
> Thanks very much for your quick answer! I've tried your suggestion in
> the following way:
>
> # separate the ruspini data into train and test set
> > train<-ruspini[1:50,]
> > test<-ruspini[51:75,]
> > pamx<-pam(train,4)
> > knnx<-knn(pamx$medoids,test,factor(c("a","b","c","d")),k=3)
> > knnx
> [1] d d b b d c b c c d c a a d c c a a c a a d c d a
> Levels: a b c d
>
> But the result of applying the test set to the knn should only contain 2
> clusters, since the upper half of the ruspini data contains only 2
> clusters.
>
> Could you tell me what I am missing here?
You asked that the upper half be divided into 4 clusters. Did you look at
the object pamx? It contains 4 clusters covering only the first part of
the dataset.
Given that when you apply pam to the whole dataset there is a cluster that
only occurs for objects 61:75, there is no way you can find that cluster
when no member of it is in your training set.
--
Brian D. Ripley, ripley at stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595
More information about the R-help
mailing list