[R] Clustering quality measure
Martin Maechler
maechler at stat.math.ethz.ch
Wed Jun 18 10:34:50 CEST 2003
>>>>> "Jonck" == Jonck van der Kogel <jonck at vanderkogel.net>
>>>>> on Tue, 17 Jun 2003 17:23:33 +0200 writes:
Jonck> Hi all, I am running a series of experiments where
Jonck> after manipulating my data I run several clustering
Jonck> algorithms (agnes, diana and a clustering method of
Jonck> my own) on the data. I wanted to determine which
Jonck> clustering method did the best job, so therefore I
Jonck> had defined my own quality measure using two
Jonck> criteria: compactness of the data within the clusters
Jonck> themselves and the amount of seperation between the
Jonck> clusters. Anyway, my quality measure does not work,
Jonck> since according to my quality measure the quality
Jonck> gets increasingly better as more clusters are formed
Jonck> untill every data instance is a cluster by itself.
Jonck> Therefore I was wondering if any of you are aware of
Jonck> any libraries or functions within R that determine
Jonck> quality measures of clusterings, I am very much
Jonck> intrigued by the definition of quality measures that
Jonck> do work. Thanks very much, Jonck
Well, "do work" is said much.
But there's silhouette() in the `cluster' package {where agnes()
and diana() reside}. You can plot silhouettes of almost any
clustering {i.e. grouping} as a diagnostic, and the "Average
Silhouette Width" has been proposed as "goodness of fit" measure
for clusters, and even to determine how many clusters you should
choose.
One of its several drawbacks is that it's not defined for the
"only 1 cluster" situation, i.e., you cannot use it to compare
one vs two clusters.
--> ?silhouette
and look and try the "Examples".
Regards,
Martin Maechler <maechler at stat.math.ethz.ch> http://stat.ethz.ch/~maechler/
Seminar fuer Statistik, ETH-Zentrum LEO C16 Leonhardstr. 27
ETH (Federal Inst. Technology) 8092 Zurich SWITZERLAND
phone: x-41-1-632-3408 fax: ...-1228 <><
More information about the R-help
mailing list