[R] Examining how cases are similar by cluster, in cluster analysis
Bob Green
bgreen at dyson.brisnet.org.au
Sun Nov 18 22:22:11 CET 2012
David,
Many thanks, I'm sure this will be helpful. What would also be
helpful is if I can extract each cluster and examine id by variable,
within the respective cluster. I could index the variables for each
cluster and run such an analysis but thre must be a more efficient
way of doing this (especially as I experiment with different
clustering methods)
Thanks again,
Bob
At 06:44 AM 19/11/2012, David L Carlson wrote:
>If you just want a summary of the mean for each variable in each
>cluster, this will get you there:
>
> > set.seed=42
> > FS1 <- data.frame(matrix(sample(c(0, 1), 12*63, replace=TRUE),
>nrow=63,
>+ ncol=12))
> > dmat <- dist(FS1, method="binary")
> > cl.test <- hclust(dmat, method="average")
> > plot(cl.test, hang=-1)
> > hcli8 <- cutree(cl.test, k=8)
> > tbl <- aggregate(FS1, by=list(Group=hcli8), mean)
> > print(tbl, digits=4)
> Group X1 X2 X3 X4 X5 X6 X7 X8
>X9
>1 1 0.5122 0.6829 0.6829 0.6341 0.5854 0.5854 0.6829 0.6341
>0.5366
>2 2 0.0000 0.0000 0.0000 1.0000 0.6667 0.6667 0.0000 0.6667
>0.0000
>3 3 0.9286 0.1429 0.1429 0.1429 0.2857 0.5714 0.7857 0.3571
>0.8571
>4 4 1.0000 1.0000 1.0000 0.0000 0.0000 0.0000 0.0000 0.0000
>0.0000
>5 5 0.0000 1.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
>1.0000
>6 6 1.0000 0.0000 0.0000 0.0000 0.0000 1.0000 0.0000 1.0000
>0.0000
>7 7 1.0000 0.0000 0.0000 0.0000 1.0000 0.0000 0.0000 0.0000
>0.0000
>8 8 0.0000 1.0000 0.0000 0.0000 0.0000 1.0000 0.0000 0.0000
>0.0000
> X10 X11 X12
>1 0.4146 0.4634 0.561
>2 0.6667 0.0000 0.000
>3 0.8571 0.6429 0.500
>4 1.0000 0.0000 0.000
>5 0.0000 1.0000 0.000
>6 0.0000 0.0000 1.000
>7 0.0000 0.0000 0.000
>8 0.0000 0.0000 0.000
> >
>----------------------------------------------
>David L Carlson
>Associate Professor of Anthropology
>Texas A&M University
>College Station, TX 77843-4352
>
> > -----Original Message-----
> > From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
> > project.org] On Behalf Of Bob Green
> > Sent: Sunday, November 18, 2012 5:00 AM
> > To: r-help at r-project.org
> > Subject: [R] Examining how cases are similar by cluster, in
> > cluster analysis
> >
> > Hello,
> >
> > I used the following code to perform a cluster analysis on a
> > dataframe consisting of 12 variables (coded as 1,0) and 63
> > cases.
> >
> >
> >
> > FS1 <- read.csv("D://Arsontest2.csv",header=T,row.names=1)
> >
> > str(FS1)
> >
> > dmat <- dist(FS1, method="binary")
> >
> > cl.test <- hclust (dist(FS1, method ="binary"), "ave")
> >
> > plot(cl.test, hang = -1)
> >
> >
> >
> > Each case has an id and the dendogram identifies the respective
> > cases
> > which constitute each cluster. What I am seeking advice on is
> > how to
> > examine the variables on which the cases are similar, within
> > each cluster.
> >
> >
> >
> > sort (hcli8 <- cutree(cl.test, k=8)) identifies that the
> > following
> > cluster 2is comprised of the following cases:
> >
> > 1641 2295 2594 2654 2799 3213 3510 3513 2958 3294
> >
> > 2 2 2 2 2 2 2
> > 2
> > 2 2
> >
> >
> >
> > This code provides means for the variables by cluster. In
> > relation to
> > cluster 2 it appears the cases should have no clear motive and
> > be depressed :
> >
> > round(sapply(x, function(i) colMeans(FS1[i,])),2)
> >
> > [,1] [,2] [,3] [ ,4] [,5]
> > [,6] [,7] [,8]
> >
> > depressed 0.00 0.33 0.00 0.0 0 0.6 0.00 0.08
> >
> > unclear 0.33 1.00 1.00 1.0 0 0.0 0.07 0.12
> >
> >
> >
> > I can manually, examine this variable by variable and look at
> > how
> > each of the cases in cluster 2 are similar on the variables. I
> > am
> > looking at a more efficient and quicker way to do this.
> >
> > Bob
> >
> > ______________________________________________
> > R-help at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-
> > project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible
> > code.
More information about the R-help
mailing list