[R] cluster analysis
paulandpen
paulandpen at optusnet.com.au
Fri Nov 2 13:10:53 CET 2007
AMINA SHAHZADI,
The eternal question.
What I do is that I generate a range of solutions, profile them on variables
used to cluster the data into groups and any other information I have to
profile the cluster groups on and then present the solutions to a group of
others to assess meaningfulness, debate on the solutions and attempt to
reach a consensus etc
In many cases, eg, for algorithms based on k-means and hierarchical
clustering, you are using an exploratory technique and there are no
right/wrong answers to this
Having used cluster analysis for years some things to look at because there
is no way to answer this statistically (unless you are using a latent class
type model with goodness of fit measures) are the following
1. What is the minimum size you believe to be robust for a single cluster
(eg n=30, n=100) etc because the larger the number of clusters you generate
relative to sample size, the smaller your clusters will be and there must be
a cut-off point defined upon which you are not prepared to go any lower...
2. If you run the clusters through different algorithms, how comparable are
the results (cluster stability)
2. What differences emerge between 2, 3, 4 cluster solutions etc (as you
utilise larger numbers of clusters, does this still produce a meaningful
result in that the clusters are distinct and unique, or are you just cutting
larger clusters into smaller clusters without generating unique and usable
information... Examine the clusters via a series of cross tabs (as you go
from 2 to 3 to 4 cluster solutions) what happens to the members within
clusters, are they distributed differently etc
Thanks Paul
----- Original Message -----
From: "amna khan" <amnakhan493 at gmail.com>
To: <R-help at stat.math.ethz.ch>
Sent: Friday, November 02, 2007 2:19 AM
Subject: [R] cluster analysis
> Hi Sir
>
> How can we select the optimum number of clusters?
>
> Best Regards
>
> --
> AMINA SHAHZADI
> Department of Statistics
> GC University Lahore, Pakistan.
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
More information about the R-help
mailing list