[R] merge( , by='row.names') slowness
dms
dschruth at gmail.com
Wed Mar 2 21:16:27 CET 2011
I noticed that joining two data.frames in R using the "merge"
function that using by='row.names' slows things down substantially
when compared to just joining on a common index column.
Using a dataframe size of ~10,000 rows: it's as slow as 10 minutes in
the by='row.names' case versus merely 1 second using an index column.
Beyond the 10^6 range, it's unusably slow.
n <- 5
a <- data.frame(id=as.character(1:10^n), x=rnorm(10^n)); rownames(a)
<- a$id
b <- data.frame(id=as.character(1:10^n + 10^(n-1)), y=rnorm(10^n));
rownames(b) <- b$id
date()
fast <- merge(a, b, all=T)
date()
slow <- merge(a, b, all=T, by='row.names')
date()
Has anybody else noticed this?
More information about the R-help
mailing list