[R] clustering with hclust
Christian Hennig
ucakche at ucl.ac.uk
Fri Jul 25 13:19:19 CEST 2014
Dear Marianna,
the function agnes in library cluster can compute Ward's method from a raw
data matrix (at least this is what the help page suggests).
Also, you may not be using the most recent version of hclust. The most
recent version has a note in its help page that states:
"Two different algorithms are found in the literature for Ward clustering.
The one used by option "ward.D" (equivalent to the only Ward option "ward"
in R versions <= 3.0.3) does not implement Ward's (1963) clustering
criterion, whereas option "ward.D2" implements that criterion (Murtagh and
Legendre 2013). With the latter, the dissimilarities are squared before
cluster updating. Note that agnes(*, method="ward") corresponds to
hclust(*, "ward.D2")."
The Murtagh and Legendre paper has more details on this and is here:
http://arxiv.org/abs/1111.6285
F. Murtagh and P. Legendre, "Ward's hierarchical clustering method:
clustering criterion and agglomerative algorithm"
It's not clear to me why one would want to use Ward's method for this kind
of data, but that's your decision of course.
Best wishes,
Christian
On Fri, 25 Jul 2014, Marianna Bolognesi wrote:
> Hi everybody, I have a problem with a cluster analysis.
>
> I am trying to use hclust, method=ward.
>
> The Ward method works with SQUARED Euclidean distances.
>
> Hclust demands "a dissimilarity structure as produced by dist".
>
> Yet, dist does not seem to produce a table of squared euclidean distances,
> starting from cosines.
> In fact, computing manually the squared euclidean distances from cosines
> (d=2(1-cos)) produces a different outcome.
>
> As a consequence, using hclust with ward method on a table of cosines
> tranformed into distances with dist, produces a different dendrogram than
> other programs for hierarchical clustering with ward method (i.e.
> multidendrograms). Weird right??
>
> Computing manually the distances and then feeding them to hclust produces
> an error message. So, I am wondering, what the hell is this dist function
> doing?!
>
> thanks!
>
> marianna
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
*** --- ***
Christian Hennig
University College London, Department of Statistical Science
Gower St., London WC1E 6BT, phone +44 207 679 1698
c.hennig at ucl.ac.uk, www.homepages.ucl.ac.uk/~ucakche
More information about the R-help
mailing list