[R] kmeans (again)
Liaw, Andy
andy_liaw at merck.com
Fri Jun 6 04:19:35 CEST 2003
Just because you get the same answer from different starting points doesn't
mean the algorithm isn't using the starting points you specified.
I tried:
> set.seed(1)
> x <- matrix(rnorm(12), 6, 2)
> kmeans(x, x[c(1,6),], 1)
$cluster
[1] 2 1 2 1 1 2
$centers
[,1] [,2]
1 0.7028106 0.6482392
2 -0.7608503 0.4843512
$withinss
[1] 2.86861843 0.04450923
$size
[1] 3 3
> kmeans(x, 2, 1)
$cluster
[1] 2 1 2 1 1 2
$centers
[,1] [,2]
1 0.7028106 0.6482392
2 -0.7608503 0.4843512
$withinss
[1] 2.86861843 0.04450923
$size
[1] 3 3
> kmeans(x, x[c(3,4),], 1)
$cluster
[1] 1 1 1 2 1 1
$centers
[,1] [,2]
1 -0.3538799 0.7406319
2 1.5952808 -0.3053884
$withinss
[1] 2.089050 0.000000
$size
[1] 5 1
which shows that the result *can* depend on the starting values.
Andy
> -----Original Message-----
> From: Luis Torgo [mailto:ltorgo at liacc.up.pt]
> Sent: Thursday, June 05, 2003 2:05 PM
> To: r-help at stat.math.ethz.ch
> Subject: [R] kmeans (again)
>
>
> Regarding a previous question concerning the kmeans function
> I've tried the
> same example and I also get a strange result (at least
> according to what is
> said in the help of the function kmeans). Apparently, the function is
> disregarding the initial cluster centers one gives it.
> According to the help
> of the function:
>
> centers: Either the number of clusters or a set of initial cluster
> centers...
>
> Now a small dataset:
> > data<-matrix(c(-1,0,2,2.5,7,9,0,3,0,6,1,4),6,2)
>
> If I use rows 3 and 4 as cluster centers and a single
> iteration of kmeans I
> get:
> > kmeans(data,data[c(3,4),],1)
> $cluster
> [1] 1 1 1 1 2 2
>
> $centers
> [,1] [,2]
> 1 0.875 2.25
> 2 8.000 2.50
>
> $withinss
> [1] 32.9375 6.5000
>
> $size
> [1] 4 2
>
> If I now use rows 1 and 6 as cluster centers I get exactly
> the same solution
> after the first iteration:
>
> > kmeans(data,data[c(1,6),],1)
> $cluster
> [1] 1 1 1 1 2 2
>
> $centers
> [,1] [,2]
> 1 0.875 2.25
> 2 8.000 2.50
>
> $withinss
> [1] 32.9375 6.5000
>
> $size
> [1] 4 2
>
> So, apparently the function is disregarding the initial
> cluster centers
> information. This is even "confirmed" by the fact that if I
> use the function
> without cluster centers, simply given the number of clusters,
> I get the same
> solution:
> > kmeans(data,2,1)
> $cluster
> [1] 2 2 2 2 1 1
>
> $centers
> [,1] [,2]
> 1 8.000 2.50
> 2 0.875 2.25
>
> $withinss
> [1] 6.5000 32.9375
>
> $size
> [1] 2 4
>
>
>
> --
> Luis Torgo
> FEP/LIACC, University of Porto Phone : (+351) 22 607 88 30
> Machine Learning Group Fax : (+351) 22 600 36 54
> R. Campo Alegre, 823 email : ltorgo at liacc.up.pt
> 4150 PORTO - PORTUGAL WWW :
> http://www.liacc.up.pt/~ltorgo
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://www.stat.math.ethz.ch/mailman/listinfo/r-help
>
------------------------------------------------------------------------------
Notice: This e-mail message, together with any attachments, cont... {{dropped}}
More information about the R-help
mailing list