[R] kmeans (again)
Luis Torgo
ltorgo at liacc.up.pt
Thu Jun 5 20:04:35 CEST 2003
Regarding a previous question concerning the kmeans function I've tried the
same example and I also get a strange result (at least according to what is
said in the help of the function kmeans). Apparently, the function is
disregarding the initial cluster centers one gives it. According to the help
of the function:
centers: Either the number of clusters or a set of initial cluster
centers...
Now a small dataset:
> data<-matrix(c(-1,0,2,2.5,7,9,0,3,0,6,1,4),6,2)
If I use rows 3 and 4 as cluster centers and a single iteration of kmeans I
get:
> kmeans(data,data[c(3,4),],1)
$cluster
[1] 1 1 1 1 2 2
$centers
[,1] [,2]
1 0.875 2.25
2 8.000 2.50
$withinss
[1] 32.9375 6.5000
$size
[1] 4 2
If I now use rows 1 and 6 as cluster centers I get exactly the same solution
after the first iteration:
> kmeans(data,data[c(1,6),],1)
$cluster
[1] 1 1 1 1 2 2
$centers
[,1] [,2]
1 0.875 2.25
2 8.000 2.50
$withinss
[1] 32.9375 6.5000
$size
[1] 4 2
So, apparently the function is disregarding the initial cluster centers
information. This is even "confirmed" by the fact that if I use the function
without cluster centers, simply given the number of clusters, I get the same
solution:
> kmeans(data,2,1)
$cluster
[1] 2 2 2 2 1 1
$centers
[,1] [,2]
1 8.000 2.50
2 0.875 2.25
$withinss
[1] 6.5000 32.9375
$size
[1] 2 4
--
Luis Torgo
FEP/LIACC, University of Porto Phone : (+351) 22 607 88 30
Machine Learning Group Fax : (+351) 22 600 36 54
R. Campo Alegre, 823 email : ltorgo at liacc.up.pt
4150 PORTO - PORTUGAL WWW : http://www.liacc.up.pt/~ltorgo
More information about the R-help
mailing list