[R] distance between two matrices
Prof Brian Ripley
ripley at stats.ox.ac.uk
Wed Jan 28 16:08:47 CET 2004
On Wed, 28 Jan 2004, "Hüsing, Johannes" wrote:
> > Hi all,
> > Say I have a matrix A with dimension m x 2 and matrix B with
> > dimension n x 2. I would like to find the row in A that is closest to
> > the each row in B. Here's an example (using a loop):
> >
> > set.seed(1)
> > A <- matrix(runif(12), 6, 2) # 6 x 2
> > B <- matrix(runif(6), 3, 2) # 3 x 2
> > m <- vector("numeric", nrow(B))
>
> make the lines below a function of a vector argument and
> apply it over the rows of B.
>
> ?apply for more info. You'll want to know about apply if
> you want to avoid loops (which is a good approach).
Unfortunately apply() is a wrapper for a for() loop, so will not help much
(if at all).
> > for(j in 1:nrow(B)) {
> > d <- (A[, 1] - B[j, 1])^2 + (A[, 2] - B[j, 2])^2
> > m[j] <- which.min(d)
> > }
You can improve this a bit: see predict.qda.
> > All I need is m[]. I would like to accomplish this without using the
> > loop if possible, since for my real data n > 140K and m > 1K. I hope
> > this makes sense.
>
> Thing is, the above approach requires all data to be in main memory.
> i hope this is not a problem.
A 140K x 2 array takes up 1.6Mb, and R needs 10x that to run at all.
Several people have mentioned knn1 as a C-level equivalent of the loops
(and I timed it as probably fast enough). Roger Bivand mentioned
quadtrees, and that is one of a class of possible solutions if you need
extra speed. Which member of that class is suitable depends on the
spatial distribution of A and B (viewing the rows as 2D points), but it is
hard to do very much better for only around a 1000 reference points.
--
Brian D. Ripley, ripley at stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595
More information about the R-help
mailing list