[R] silhoutte.default bugs
Murad Nayal
mn216 at columbia.edu
Wed Jan 21 21:19:28 CET 2004
Hello all,
This might have been fixed in later versions (I am using R1.7.0), r-help
archive contains messages reporting similar problems but no reports of
codes fixes. I have encountered a couple of problems using the
silhouette function. one occurs when the clustering contains clusters
composed of 1 element (Martin Maechler posted code few months ago that
fixes a similar problem that occurs when clusters have only 2 elements
but not the case with 1 element). the other problem is due to
silhouette's assumption that the clusters are numbered sequentially
starting at 1. one of the clustering programs I use (snob) assigns more
or less arbitrary integer ids to clusters starting from 3! (clusters 1
and 2 have special meaning in snob). the modified code fixing both
problems is included below, changes are commented.
best
Murad
silhouette.default <-
function (x, dist, dmatrix, ...)
{
cll <- match.call()
if (!is.null(cl <- x$clustering))
x <- cl
n <- length(x)
if (!all(x == round(x)))
stop("`x' must only have integer codes")
k <- length(clid <- sort(unique(x)))
if (k <= 1 || k >= n)
return(NA)
if (missing(dist)) {
if (missing(dmatrix))
stop("Need either a dissimilarity `dist' or diss.matrix
`dmatrix'")
if (is.null(dm <- dim(dmatrix)) || length(dm) != 2 ||
!all(n == dm))
stop("`dmatrix' is not a dissimilarity matrix compatible to
`x'")
}
else {
dist <- as.dist(dist)
if (n != attr(dist, "Size"))
stop("clustering `x' and dissimilarity `dist' are
incompatible")
dmatrix <- as.matrix(dist)
}
wds <- matrix(NA, n, 3, dimnames = list(names(x), c("cluster",
"neighbor", "sil_width")))
for (j in 1:k) {
Nj <- sum(iC <- x == clid[j])
#
# the following line changed from wds[iC, "cluster"] <- j
#
wds[iC, "cluster"] <- clid[j]
a.i <- if (Nj > 1)
colSums(dmatrix[iC, iC])/(Nj - 1)
else 0
#
# the following line changed from
# diC <- rbind(apply(dmatrix[!iC, iC], 2, function(r) tapply(r,
# x[!iC], mean)))
#
diC <- rbind(apply(cbind(dmatrix[!iC, iC]), 2, function(r)
tapply(r,
x[!iC], mean)))
minC <- max.col(-t(diC))
wds[iC, "neighbor"] <- clid[-j][minC]
#
# the following line changed from
# b.i <- diC[cbind(minC, seq(minC))]
#
b.i <- diC[cbind(minC, seq(along=minC))]
s.i <- (b.i - a.i)/pmax(b.i, a.i)
wds[iC, "sil_width"] <- s.i
}
attr(wds, "Ordered") <- FALSE
attr(wds, "call") <- cll
class(wds) <- "silhouette"
wds
}
--
Murad Nayal M.D. Ph.D.
Department of Biochemistry and Molecular Biophysics
College of Physicians and Surgeons of Columbia University
630 West 168th Street. New York, NY 10032
Tel: 212-305-6884 Fax: 212-305-6926
More information about the R-help
mailing list