[R] Can I compare two clusters without using their distance-matrix (dist()) ?
Christian Hennig
chrish at stats.ucl.ac.uk
Wed Apr 21 19:16:58 CEST 2010
Dear Tal,
I took the definition of the Hubert gamma- and Dunn-index from the Gordon
book. They are actually not about comparing two clusters, at least not in
that reference, and they require dissimilarities.
The adjusted Rand index and Meila's VI, as implemented in
cluster.stats, compare two clusterings. If you set compareonly=TRUE in
cluster.stats, it only computes these two indexes, so it doesn't need the
dissimilarity matrix in principle. I will probably in the next update
change it so that in this case you don't need to provide a
dissimilarity matrix.
Until then, you can supply a noninformative matrix.
Example:
c1 <- sample(4,100,replace=TRUE)
c2 <- sample(5,100,replace=TRUE)
cs <- cluster.stats(d=matrix(0,ncol=100,nrow=100),c1,c2,compareonly=TRUE)
cs$corrected.rand
cs$vi
Hope this helps,
Christian
On Wed, 21 Apr 2010, Tal Galili wrote:
> Thanks for the fast reply Uwe.
>
> My hope in posting this was to find if anyone had already done work (in R)
> in this direction. So far I wasn't able to find any such relevant code, so
> I turned to the mailing list.
>
> Regarding new implementations - thanks for offering! - I have already came
> around one such algorithm - I implemented it, and will probably publish it
> on my blog <http://www.r-statistics.com/> in the near future.
>
> If any one else has any reference to R implementation, it would be most
> helpful,
> Tal
>
>
> ----------------Contact
> Details:-------------------------------------------------------
> Contact me: Tal.Galili at gmail.com | 972-52-7275845
> Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) |
> www.r-statistics.com (English)
> ----------------------------------------------------------------------------------------------
>
>
>
>
> 2010/4/21 Uwe Ligges <ligges at statistik.tu-dortmund.de>
>
>> On 21.04.2010 18:15, Tal Galili wrote:
>>
>>> Hello all,
>>>
>>> I would like to compare the similarity of two cluster solutions using a
>>> validation criteria (such as Hubert's gamma coefficient, the Dunn index
>>> the
>>> corrected rand index and so on)
>>>
>>> I see (from here:http://www.statmethods.net/advstats/cluster.html) that
>>> the function cluster.stats() in the fpc package provides a mechanism
>>> for comparing 2 cluster solutions - *BUT* - it requires me to give the
>>> the distance matrix among objects.
>>>
>>> *My question *is: What ways can you suggest for comparing two cluster
>>> solutions, while using the cluster indicators only (i.e: a vector saying
>>> to
>>> which cluster each object belongs to), and WITHOUT asking to submit the
>>> distance matrix between the objects.
>>>
>>
>> Don't know. If you have a theoretical solution and can provide the
>> description of a method, there will be many people around happy to make an
>> algorithm and implement it.
>>
>> Uwe Ligges
>>
>>
>>
>> Thanks,
>>> Tal
>>>
>>>
>>>
>>> ----------------Contact
>>> Details:-------------------------------------------------------
>>> Contact me: Tal.Galili at gmail.com | 972-52-7275845
>>> Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) |
>>> www.r-statistics.com (English)
>>>
>>> ----------------------------------------------------------------------------------------------
>>>
>>> [[alternative HTML version deleted]]
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
*** --- ***
Christian Hennig
University College London, Department of Statistical Science
Gower St., London WC1E 6BT, phone +44 207 679 1698
chrish at stats.ucl.ac.uk, www.homepages.ucl.ac.uk/~ucakche
More information about the R-help
mailing list