[R] re-ordering a vector by name

Liaw, Andy andy_liaw at merck.com
Sat May 8 04:44:12 CEST 2004


> From: Sundar Dorai-Raj 
> 
> Liaw, Andy wrote:
> 
> > Dear R-help,
> > 
> > Let's say `x1' and `x2' are very long vectors (length=5e5, 
> say) with same
> > set of names but in different order.  If I want to sort 
> `x2' in the order of
> > `x1', I would do 
> > 
> >   x2[names(x1)]
> > 
> > but the amount of time that takes is quite prohibitive!  
> Does anyone have
> > any suggestion on a more efficient way to do this?
> > 
> > If the two vectors are exactly the same length (as I said 
> above), sorting
> > both by names would probably be the fastest.  However, if 
> the two vectors
> > differ in length (and the names for the shorter one are a 
> subset of names of
> > the longer one) then that doesn't work...
> > 
> > Best,
> > Andy
> 
> Hi Andy,
>    
> Using match seems to be *much* faster:
> 
> R> x1 <- 1:10000; names(x1) <- 1:10000
> R> x2 <- 1:10000; names(x2) <- 10000:1
> R> system.time(x3 <- x1[names(x2)])
> [1] 1.88 0.00 1.88   NA   NA
> R> system.time(x4 <- x1[match(names(x1), names(x2))])
> [1] 0.01 0.00 0.01   NA   NA
> R> all.equal(x3, x4)
> [1] TRUE
> R>
> 
> This should also work if x1 and x2 are of diffent lengths.
> 
> --sundar

Sundar,

Thanks very much for the tip!  However, I think the arguments in match() is
backward:

> n = 1e4
> x1 = sample(n)
> x2 = sample(n)
> names(x1) = sample(n)
> names(x2) = sample(n)
> system.time(x3 <- x1[names(x2)])
[1] 5.71 0.00 6.02   NA   NA
> system.time(x4 <- x1[match(names(x1),names(x2))])
[1] 0.03 0.00 0.03   NA   NA
> all.equal(x3, x4)
[1] "Names: 9997 string mismatches"       "Mean relative  difference:
0.669837"
> names(x3[1:5])
 [1] "5391" "9927" "6499" "1863" "8287"
> names(x4[1:5])
 [1] "2560" "9914" "6348" "1291" "5718"
> system.time(x4 <- x1[match(names(x2),names(x1))])
[1] 0.03 0.00 0.03   NA   NA
> names(x4[1:5])
 [1] "5391" "9927" "6499" "1863" "8287"
> all.equal(x3, x4)
[1] TRUE

[Admittedly this is why I rarely use match():  I get mixed up easily.]

Reid: It isn't a memory problem.  For vectors of length 6e5, I killed the R
process after more than 5 hours on an Opteron 248.  The R process was taking
up about 114MB of RAM, out of 8GB in the box.  I'm rather surprised that
such seemingly simple operation would take so long, especially when sorting
such vectors is very fast.  What am I missing?

Best,
Andy




More information about the R-help mailing list