[R] A useful alternative to simple merging?

Tue Mar 17 18:22:18 CET 2009

In the case of 1:1 merging with distinct sets of non-ID variables in two 
or more datasets, would the following code, which doesn't need to form 
the larger merged data frame, be useful or faster?  [A generalization of 
with() would make this even better.  I've often wondered about the 
utility of a "merged environment".]

 > set.seed(1)
 > a <- data.frame(id=c(1:3, 5, 7), x1=runif(5))
 > b <- data.frame(id=c(1:3, 4, 6), x2=runif(5))
 > a
   id        x1
1  1 0.2655087
2  2 0.3721239
3  3 0.5728534
4  5 0.9082078
5  7 0.2016819
 > b
   id         x2
1  1 0.89838968
2  2 0.94467527
3  3 0.66079779
4  4 0.62911404
5  6 0.06178627
 >
 > ida <- a$id;  idb <- b$id
 > ids <- sort(unique(c(ida, idb)))
 > i <- match(ids, ida)
 > j <- match(ids, idb)
 > a[i,]$x1
[1] 0.2655087 0.3721239 0.5728534        NA 0.9082078        NA 0.2016819
 > b[j,]$x2
[1] 0.89838968 0.94467527 0.66079779 0.62911404         NA 0.06178627 
       NA
 >
 > with(a[i,],
+      with(b[j,],
+           cbind(x1,x2)))
             x1         x2
[1,] 0.2655087 0.89838968
[2,] 0.3721239 0.94467527
[3,] 0.5728534 0.66079779
[4,]        NA 0.62911404
[5,] 0.9082078         NA
[6,]        NA 0.06178627
[7,] 0.2016819         NA

-- 
Frank E Harrell Jr   Professor and Chair           School of Medicine
                      Department of Biostatistics   Vanderbilt University