[R] mclust - clustering by spatial patterns

Christian Hennig fm3a004 at math.uni-hamburg.de
Fri Dec 19 11:54:38 CET 2003


Hi,

the package prabclus contains (as command prabclust) a conversion of
presence-absence data to the output of a multidimensional scaling based
on Jaccard or Kulczynski distances. The MDS output then is clustered by
mclust, including estimation of noise points, that do not belong to any
cluster. This is much better than clustering the presence-absence data
directly, because 0-1-data are far from the normal distribution or normal
mixtures and you will have a very high dimensionality if you take every
region as a variable, and mclust is often unstable in high dimensions. 

We have some experience with clustering presence-absence data with
MDS/mclust as well as with distance based methods such as
average/complete linkage of hclust. All
methods may be somewhat unstable; results have to be interpreted with
care. The advantages of mclust on MDS data are:
1) Automatic decision about number of clusters and presence of noise
points (this is done not by mclust, but by nnclean included in package
prabclus),
2) Clusters may have different variance/covariance structures, which may
be useful, if some "real" clusters contain very similar presenmce patterns 
while others are more widespread. Such a situation often confuses complete
linkage and familiar algorithms.
3) You get a "natural" visualization (MDS solution) of your clustering.
(You can do this without performing mclust, though.)

A drawback of MDS/mclust compared to distance based methods is the
additional loss of information and often instabilty induced by the MDS. We
have the experience that the results of Kruskal's MDS often vary
significantly (not only by rotation) between different machines
(Brian Ripley said once to me that Kruskal's MDS is stable in most cases,
and I think the particular structure of presence-absence data and the
usual distances for these data make a difference here)!
Therefore we suggest to use classical MDS, which has other drawbacks,
though.

Package prabclus contains also a methodology to test if there is any
clustering at all. This test is based only on distances, there is no
further information reduction by MDS.

Best,
Christian

On Thu, 18 Dec 2003, Thomas W Blackwell wrote:

> On Thu, 18 Dec 2003, Jarrod Hadfield wrote:
> 
> > Dear All,
> >
> > I have spatial data (presence/absence for 4000 squares) on 250 bird
> > species and would like to use a model-based clustering technique to
> > test for species associations.  Is there any way of passing a
> > distance/correlation matrix to mclust as with hclust, rather than the
> > actual data?  Or alternatively, is there a way of getting mclust to
> > handle binary data?
> >
> > I'd appreciate any suggestions!
> 
> Why not simply use  dist()  and  hclust() ?   Starting with
> presence/absence data, what could  mclust()  possibly do that
> is different from  hclust() ?

...see above...

> 
> >
> > Cheers,
> >
> > Jarrod
> >
> 
> -  tom blackwell  -  u michigan medical school  -  ann arbor  -
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://www.stat.math.ethz.ch/mailman/listinfo/r-help
> 

***********************************************************************
Christian Hennig
Fachbereich Mathematik-SPST/ZMS, Universitaet Hamburg
hennig at math.uni-hamburg.de, http://www.math.uni-hamburg.de/home/hennig/
#######################################################################
ich empfehle www.boag-online.de




More information about the R-help mailing list