[R] breaks
Martin Maechler
maechler at stat.math.ethz.ch
Fri Jun 13 19:35:14 CEST 2003
>>>>> "DavidB" == David Brahm <brahm at alum.mit.edu>
>>>>> on Fri, 13 Jun 2003 10:56:29 -0400 writes:
DavidB> Martin Maechler <maechler at stat.math.ethz.ch> wrote:
>> findInterval()
DavidB> Hi, Martin. I wasn't aware of findInterval(). findInterval(x, vec) looks to
DavidB> me very similar to:
R> cut(x, c(-Inf,vec,Inf), labels=FALSE, right=FALSE) - 1
DavidB> so I'm curious what the differences are (e.g. speed,
DavidB> duplicates in vec?). In any case, findInterval()
DavidB> and cut() ought to be in each other's "See Also",
DavidB> don't you think?
When I wrote the precursor of findInterval() about 10 years ago (to be
dyn.load()ed into S-plus), I hadn't yet realized about the
several alternatives.
However, when I added it to R, I knew about the N*ecdf()
alternative, i.e., ecdf() from package:stepfun which relies on
approx(....., method = "constant").
I found that findInterval() was slightly faster than approx()
even for unsorted `x' (by about a factor of 2 for large `vec') in my
test cases, but the real speed of findInterval() comes to play
when `x' is sorted -- something which is very typical e.g. for
evaluation of piecewise functions (splines etc).
R> xx <- c(-2.0, 1.4, -1.2, -2.2, 0.4, 1.5, -2.2, 0.2, -0.4, -0.9)
R> xx.y <- c(-2.2000000, -0.9666667, 0.2666667, 1.5000000)
R> findInterval(xx, xx.y)
DavidB> [1] 1 3 1 1 3 4 1 2 2 2
R> cut(xx, c(-Inf,xx.y,Inf), labels=FALSE, right=FALSE) - 1
DavidB> [1] 1 3 1 1 3 4 1 2 2 2
cut() is still slower than the ecdf() / approx() version
considerably for long `vec' ...
I really should write a small article about this for "R News",
where I'd also show the simulation results...
Martin Maechler <maechler at stat.math.ethz.ch> http://stat.ethz.ch/~maechler/
Seminar fuer Statistik, ETH-Zentrum LEO C16 Leonhardstr. 27
ETH (Federal Inst. Technology) 8092 Zurich SWITZERLAND
phone: x-41-1-632-3408 fax: ...-1228 <><
More information about the R-help
mailing list