[BioC] Where classification worked better ?

Sun Sep 20 15:50:00 CEST 2009

Hi Santana,

>>Having applied ONE clustering method separately to TWO (similar) type of datasets, I wonder ? how
>>a)      I can determine where the method worked better (not merely based on
>>visualizing the plots)!

You can compare the ratio= (between cluster variance) / (within cluster variance). This should be higher for dataset where the clustering worked best. This should work even though the number of clusters will not be the same (but very similar) for both datasets. I am not sure this measure is provided by the two clustering methods you used but you can compute it yourself once you identified which genes belong to which cluster and which are the centers of each cluster. 

>>b)      I can retrieve the clusters along with their respective contents.
>>For example: if two clusters A & B are found and the clusters contain different genes, how to access & save the genes of A and >>B.

The object returned by the function kmeans contains a component called "cluster" that tells you which columns of your data matrix belong to each cluster as well as a "centers" component (one for each cluster). See the example from kmeans function help.

Regards,
Adi Tarca