[R] Advice on exploration of sub-clusters in hierarchical dendrogram

Thu Feb 23 16:54:49 CET 2012

Dear R user,

I am a biochemist/bioinformatician, at the moment working on protein
clusterings by conformation similarity.

I only started seriously working with R about a couple of months ago.
I have been able so far to read my way through tutorials and set-up my
hierarchical clusterings. My problem is that I cannot find a way to obtain
information on the rooting of specific nodes, i.e. of specific clusters of
interest.
In other words, I am trying to obtain/read the sub-clusters of a specific
cluster in the dendrogram, by isolating a specific node and exploring
locally its lower hierarchy.

Please allow me to display some of the code I have been using for your
reference:

df=read.table('mydata.txt', head=T, row.names=1) #read file with distance
matrix
d=as.dist(df) #format table as distance matrix
z<-hclust(d,method="complete", members=NULL)
x<-as.dendrogram(z)
plot(x, xlab="mydata complete-LINKAGE", ylim=c(0,4)) #visualization of the
dendrogram
clusters<-cutree(z, h=1.6) #obtain clusters at cutoff height=1.6
ord<-cmdscale(d, k=2) #Multidimensional scaling of the data down to 2
dimensions
clusplot(ord,clusters, color=TRUE, shade=TRUE,labels=4, lines=0)
#visualization of the clusters in 2D map
var1<-var(clusters==1) #variance of cluster 1

#extract cluster memberships:
clids = as.data.frame(clusters)
names(clids) = c("id")
clids$cdr = row.names(clids)
row.names(clids) = c(1:dim(clids)[1])
clstructure = lapply(unique(clids$id), function(x){clids[clids$id ==
x,'cdr']})

clstructure[[1]] #get memberships of cluster 1

>From this point, eventually, I could recreate a distance matrix with only
the members of a specific cluster and then re-apply hierarchical clustering
and start all over again.
But this would take me ages to perform individually for hundred of clusters.
So, I was hoping if anyone could point me to a direction as to how to take
advantage of the initial dendrogram and focus on specific clusters from
which to derive the sub-clusters at a new given cutoff height.

I recently found in this page 
http://manuals.bioinformatics.ucr.edu/home/R_BioCondManual
http://manuals.bioinformatics.ucr.edu/home/R_BioCondManual 

the following code:
clid <- c(1,2)
ysub <- y[names(mycl[mycl%in%clid]),]
hrsub <- hclust(as.dist(1-cor(t(ysub), method="pearson")),
method="complete") # Select sub-cluster number (here: clid=c(1,2)) and
generate corresponding dendrogram.

Even with this given example I am afraid I can't work my way around.
So I guess in my case I could grab all the members of a specific cluster
using my existing code and try to reformat the distance matrix in one that
only contains the distances of those members:
cluster1members<-clstructure[[1]]

Then I need to reformat the distance matrix into a new one, say d1, which I
can feed to a new -local- hierarchical clustering:
hrsub<-hclust(d1, method="complete")

Any ideas on how I can obtain a new distance matrix with just the distances
of the members in that clusters, with names contained in vector
"cluster1members" ?

Apologies if this seems trivial, but I really can't find the correct
functions to use for this task.
Thank you very much in advance - as I am really a novice with R, small
chunks of code as example would be of great help.

Take care all - 

--
View this message in context: http://r.789695.n4.nabble.com/Advice-on-exploration-of-sub-clusters-in-hierarchical-dendrogram-tp4414277p4414277.html
Sent from the R help mailing list archive at Nabble.com.