[R] k-means with euclidian distance but no coordinates

Corrin Lakeland lakeland at atlas.otago.ac.nz
Thu Dec 13 21:42:37 CET 2001


I'm trying to build a thesaurus that will sensible values for rare words.  
I suspect the best algorithm to use is k-means although I'm not sure about
that -- I would have preferred a k dimensional space with a binary cluster
in each dimension so a word can belong to 0..k clusters, but I digress...

I can measure the strength of correlation between words fairly easily by
counting cooccurance divided by frequency of each word, giving a euclidian
distance, although this doesn't work especially well for rare words.  
However I don't have coordinates as such, and deriving them given distance
is non-trivial.

Now, as I understand k-means, it uses euclidian distance rather than
coordiantes, the first step given in texts is to derive the distance given
the coordinates. But I can't find a way to call the built in function
without coordinates.  I had a look at R-1.3.1/src/library/mva/src/kmns.f
but my Fortran isn't good and I had enough trouble following the code, so
I'm not up to making major changes.

Any help or ideas would be appreciated

Corrin Lakeland <lakeland at cs.otago.ac.nz> 
Department of Computer Science
University of Otago, New Zealand

r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch

More information about the R-help mailing list