[R] pam() seems to ignore cluster number
Martin Maechler
maechler at stat.math.ethz.ch
Tue May 24 08:50:17 CEST 2011
>>>>> Dario Strbenac <D.Strbenac at garvan.org.au>
>>>>> on Wed, 18 May 2011 12:00:11 +1000 writes:
> I am using PAM with k = 10 clusters, but I only get one cluster
> ID for all my observations. I couldn't find any discussion about
> this in the help file, or mailing lists. Is there a reasonable
> explanation for this result ?
> cIDs <- pam(all, 10, cluster.only = TRUE, do.swap = FALSE)
>> table(cIDs)
> cIDs
> 0
> 16671
> The matrix of observations can be found at :
> http://129.94.136.7/file_dump/dario/all.obj
For the mailing list archives:
Dario's data contained so many NA's that some of the computed
dissimalirities "had to be" NA as well.
Had he used
pam(all, 10)
pam(all, 10, do.swap = FALSE)
he would have got the error message
"No clustering performed, NAs in the computed dissimilarity matrix."
But because of 'cluster.only=TRUE'
*and* because of a lapsus of the 'cluster' maintainer (me),
pam() returned without the error message in this case.
The next release of R (or of 'cluster') will give the error
message also in the case of 'cluster.only=TRUE' .
Martin Maechler, ETH Zurich
> I'm using R version 2.13.0 (2011-04-13) on Platform:
> x86_64-unknown-linux-gnu (64-bit) and have cluster_1.13.3.
> --------------------------------------
> Dario Strbenac
> Research Assistant
> Cancer Epigenetics
> Garvan Institute of Medical Research
> Darlinghurst NSW 2010
> Australia
More information about the R-help
mailing list