[R] Using indexing to manipulate data
Gabor Grothendieck
ggrothendieck at gmail.com
Thu Mar 18 09:41:19 CET 2010
Here are two solutions. The first uses merge and the second uses
sqldf. They both do a self join picking off the unique pairs. The
sqldf solution also sorts the result:
# input
DF <- structure(list(Actor = c("Jim", "Bob", "Bob", "Larry", "Alice",
"Tom", "Tom", "Tom", "Alice", "Nancy"), Act = c("A", "A", "C",
"D", "C", "F", "D", "A", "B", "B")), .Names = c("Actor", "Act"
), class = "data.frame", row.names = c(NA, -10L))
subset(unique(merge(DF, DF, by = 2)), Actor.x < Actor.y)
library(sqldf) # see http://sqldf.googlecode.com
sqldf("select A.Actor, A.Act, B.Act
from DF A join DF B
where A.Act = B.Act and A.Actor < B.Actor
order by A.Act, A.Actor")
On Thu, Mar 18, 2010 at 1:05 AM, duncandonutz <dwadswor at unm.edu> wrote:
>
> I know one of R's advantages is it's ability to index, eliminating the need
> for control loops to select relevant data, so I thought this problem would
> be easy. I can't crack it. I have looked through past postings, but
> nothing seems to match this problem
>
> I have a data set with one column of actors and one column of acts. I need
> a list that will give me a pair of actors in each row, provided they both
> participated in the act.
>
> Example:
>
> The Data looks like this:
> Jim A
> Bob A
> Bob C
> Larry D
> Alice C
> Tom F
> Tom D
> Tom A
> Alice B
> Nancy B
>
> I would like this:
> Jim Bob
> Jim Tom
> Bob Alice
> Larry Tom
> Alice Nancy
>
> The order doesn't matter (Jim-Bob vs. Bob-Jim), but each pairing should be
> counted only once.
> Thanks!
>
> --
> View this message in context: http://n4.nabble.com/Using-indexing-to-manipulate-data-tp1597547p1597547.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
More information about the R-help
mailing list