[R] A useful alternative to simple merging?
Frank E Harrell Jr
f.harrell at vanderbilt.edu
Tue Mar 17 18:22:18 CET 2009
In the case of 1:1 merging with distinct sets of non-ID variables in two
or more datasets, would the following code, which doesn't need to form
the larger merged data frame, be useful or faster? [A generalization of
with() would make this even better. I've often wondered about the
utility of a "merged environment".]
> set.seed(1)
> a <- data.frame(id=c(1:3, 5, 7), x1=runif(5))
> b <- data.frame(id=c(1:3, 4, 6), x2=runif(5))
> a
id x1
1 1 0.2655087
2 2 0.3721239
3 3 0.5728534
4 5 0.9082078
5 7 0.2016819
> b
id x2
1 1 0.89838968
2 2 0.94467527
3 3 0.66079779
4 4 0.62911404
5 6 0.06178627
>
> ida <- a$id; idb <- b$id
> ids <- sort(unique(c(ida, idb)))
> i <- match(ids, ida)
> j <- match(ids, idb)
> a[i,]$x1
[1] 0.2655087 0.3721239 0.5728534 NA 0.9082078 NA 0.2016819
> b[j,]$x2
[1] 0.89838968 0.94467527 0.66079779 0.62911404 NA 0.06178627
NA
>
> with(a[i,],
+ with(b[j,],
+ cbind(x1,x2)))
x1 x2
[1,] 0.2655087 0.89838968
[2,] 0.3721239 0.94467527
[3,] 0.5728534 0.66079779
[4,] NA 0.62911404
[5,] 0.9082078 NA
[6,] NA 0.06178627
[7,] 0.2016819 NA
--
Frank E Harrell Jr Professor and Chair School of Medicine
Department of Biostatistics Vanderbilt University
More information about the R-help
mailing list