[R] is match slow?

Thomas Lumley tlumley at u.washington.edu
Tue Nov 20 18:35:38 CET 2001

On Tue, 20 Nov 2001, Agustin Lobo wrote:

> I'm doing
> m <- match(matriz, origen, 0)
> where matriz is a 270x900 matrix and
> origen a 11675 elements vector, and is taking
> a very long time.
> Is match a function
> implemented in C? If not, would a C
> code be faster?

Well, typing the function name at the R prompt gives
R> match
function (x, table, nomatch = NA, incomparables = FALSE)
    if (!is.logical(incomparables) || incomparables)
        .NotYetUsed("incomparables != FALSE")
    .Internal(match(if (is.factor(x)) as.character(x) else x,
        if (is.factor(table)) as.character(table) else table,

showing that it is .Internal and thus in compiled C code. Looking at
src/main/unique.c reveals that it is implemented by sticking `table' in a
hash table and looking up each element of x, which is a pretty good
algorithm for this problem. If the hash function is good it will take
about length(table)+length(x) hash computations, and you won't be able to
beat that easily.

I don't even find it that slow

> matriz<-matrix(rnorm(270*900),ncol=900)
> origen<-rnorm(11675)
> system.time(match(matriz,origen,0))
[1] 0.27 0.01 0.33 0.00 0.00

or with a lot of matches
> matriz<-matrix(sample(270*900,1:20,TRUE),ncol=900)
> origen<-1:11675
> system.time(match(matriz,origen,0))
[1] 0.01 0.00 0.01 0.00 0.00


Thomas Lumley			Asst. Professor, Biostatistics
tlumley at u.washington.edu	University of Washington, Seattle

r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch

More information about the R-help mailing list