[R] assessing performance of a classfication method

Allan Strand stranda at cofc.edu
Tue Mar 19 14:07:19 CET 2002

Hi all,

I have developed a routine to classify observations based upon
clustering.  In my current case there are 5 classes, so the data at
the end of the classification look like this:

obs   class
1      2
2      2
3      1
4      4
5      4
6      3
7      5
8      5
.      .
.      .

I always know the numbers of classes a priori.  I wanted to see how
well my approach is performing so I wrote a simulation to generate
observations in a fairly realistic manner.  I then run the simulated
observations through my scheme.  The "known" simulated data have the
same form as the results of the classification, but the class
identifiers may differ. In other words, a class of observations may be
constructed correctly by my approach, but the "name" of the class may

I would like to compare the results of my scheme to the "known"
simulated classes and assess its error rate.  AS I start, I would just
like to know the number of observations that were mis-classified.  No
doubt this is a brain-dead question to those who work in this field,
but this is my first foray into such analyses.  Ultimately I was
wondering of there is an R package that performs such analyses out of
the box or if anyone who does these kind of analyses routinely has a
code snippet I could use as an example.

Allan Strand,   Biology    http://linum.cofc.edu
College of Charleston      Ph. (843) 953-8085
Charleston, SC 29424       Fax (843) 953-5453

r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch

More information about the R-help mailing list