[R] finding clusters in a network

Gabor Csardi csardi at rmki.kfki.hu
Fri May 5 15:50:03 CEST 2006


i would recommend to use the igraph package, i think it can be done with the
others as well, but i don't know them too much.

If your ids are numeric, be sure that the first id is zero. Then create a
matrix of your data with the parentid in the first column and the id in the
second and omit the NA's. So you need something like this:

from1 to1
from2 to2

from1 -> from to1 is the first directed edge, from2 -> to2 the second, etc. 
If you have that (say in variable 'el') then it is pretty straightforward.
If you have 'N' different ids 

g <- graph(t(el), n=N)
cl1 <- clusters(g, mode="weak")
cl2 <- clusters(g, mode="strong")

These calculate weakly and strongly connected components in your graph. 
Strongly connected means that in has to be possible to reach each node from
each other node via a directed path. In the weakly connected case, a single
undirected path between nodes is enough to be in the same component.

In cl1 you have cl1$membership which contains the id of its cluster for
every node (cluster ids start with zero) and cl1$csize contains the sizes of
the clusters. The same applies to cl2.

Hope this helps,

On Thu, May 04, 2006 at 06:02:14PM -0400, jv37 at columbia.edu wrote:
> Hello,
> I have data with 1500+ observations. Each observation has two
> attributes _id_, a self identifier, and _parentid_, identifying a
> single association with a previous observation. The value of
> _parentid_ can either be the _id_ value of that single associated
> observation or NA. Different observations can be associated with
> the same previous observation and share the same value for
> _parentid_. In this way, these 1500+ observations form a directed
> graph made up of several disconnected clusters (of various sizes)
> and several isolates. I need to identify these disconnected
> clusters of two or more observations. I have been trying to figure
> out how to use the network, sna, and kinship packages, but, without
> little time before finals to read the relevant literature, I am
> desperate for some helpful advice.
> Thank you in advance--
> John
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Csardi Gabor <csardi at rmki.kfki.hu>    MTA RMKI, ELTE TTK

More information about the R-help mailing list