[R] Using indexing to manipulate data

Thu Mar 18 17:57:47 CET 2010

> -----Original Message-----
> From: r-help-bounces at r-project.org 
> [mailto:r-help-bounces at r-project.org] On Behalf Of Jim Lemon
> Sent: Thursday, March 18, 2010 1:33 AM
> To: duncandonutz
> Cc: r-help at r-project.org
> Subject: Re: [R] Using indexing to manipulate data
> 
> On 03/18/2010 04:05 PM, duncandonutz wrote:
> >
> > I know one of R's advantages is it's ability to index, 
> eliminating the need
> > for control loops to select relevant data, so I thought 
> this problem would
> > be easy.  I can't crack it.  I have looked through past 
> postings, but
> > nothing seems to match this problem
> >
> > I have a data set with one column of actors and one column 
> of acts.  I need
> > a list that will give me a pair of actors in each row, 
> provided they both
> > participated in the act.
> >
> > Example:
> >
> > The Data looks like this:
> > Jim         A
> > Bob        A
> > Bob        C
> > Larry      D
> > Alice      C
> > Tom       F
> > Tom       D
> > Tom       A
> > Alice      B
> > Nancy    B
> >
> > I would like this:
> > Jim      Bob
> > Jim      Tom
> > Bob     Alice
> > Larry   Tom
> > Alice    Nancy
> >
> > The order doesn't matter (Jim-Bob vs. Bob-Jim), but each 
> pairing should be
> > counted only once.

You can use merge() to get all possible within-
group pairings and then eliminate the self-pairings
and the same-but-for-order pairings with the following
code:

  > data <- read.table(header=FALSE, textConnection("
  + Jim         A
  + Bob        A
  + Bob        C
  + Larry      D
  + Alice      C
  + Tom       F
  + Tom       D
  + Tom       A
  + Alice      B
  + Nancy    B
  + ")) # column names are now V1 and V2
  > # add seqence numbers for elimination step
  > data$seq <- seq_len(nrow(data))
  > tmp <- merge(data,data,by="V2")
  > result <- tmp[tmp$seq.x < tmp$seq.y,] # omit unwanted pairings
  > result
     V2  V1.x seq.x  V1.y seq.y
  2   A   Jim     1   Bob     2
  3   A   Jim     1   Tom     8
  6   A   Bob     2   Tom     8
  11  B Alice     9 Nancy    10
  15  C   Bob     3 Alice     5
  20  D Larry     4   Tom     7
  
Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com 

> 
> Hi duncandonutz,
> Try this:
> 
> actnames<-read.table("junkfunc/names.dat",stringsAsFactors=FALSE)
> actorpairs<-NULL
> for(act in unique(actnames$V2)) {
>   actors<-actnames$V1[actnames$V2 == act]
>   nactors<-length(actors)
>   if(nactors > 1) {
>    indices<-combn(nactors,2)
>    for(i in 1:dim(indices)[2])
>     actorpairs<-
>      rbind(actorpairs,c(actors[indices[1,i]],actors[indices[2,i]]))
>   }
> }
> actorpairs
> 
> Jim
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>