[R] Speeding up "accumulation" code in large matrix calc?
Petr Savicky
savicky at cs.cas.cz
Fri Feb 24 22:57:25 CET 2012
On Fri, Feb 24, 2012 at 09:06:28PM +0100, Berend Hasselman wrote:
[...]
> Summary of results
>
> > Nrow <- 100
> > Ncol <- 1000
> > A <- matrix((runif(Ncol*Nrow)<0.2)+0, nrow=Nrow)
> > print(rbind(z1,z2,z3,z4,z5,z6))
> user.self sys.self elapsed user.child sys.child
> z1 0.543 0.005 0.551 0 0
> z2 0.299 0.004 0.304 0 0
> z3 0.070 0.012 0.082 0 0
> z4 0.112 0.002 0.114 0 0
> z5 0.263 0.007 0.271 0 0
> z6 0.062 0.009 0.070 0 0
[...]
> The original function (t1) benefits a lot from compiling to byte code.
> The function that operates on columns (t3) seems to be always the quickest in this experiment.
Hi.
Let me include one more solution operating on rows to the test.
t4 <- function(A, outcome) {
oneRow <- function(x, outcome)
{
n <- length(x)
y <- 1:n
z <- y
z[c(FALSE, x[-n] != outcome)] <- 0
y - cummax(z)
}
t(apply(A, 1, oneRow, outcome=outcome))
}
library(compiler)
t1.c <- cmpfun(t1)
t3.c <- cmpfun(t3)
t4.c <- cmpfun(t4)
Nrow <- 100
Ncol <- 1000
A <- matrix((runif(Ncol*Nrow)<0.2)+0, nrow=Nrow)
library(rbenchmark)
benchmark(t1(A,outcome=1),
t3(A,outcome=1),
t4(A,outcome=1),
t1.c(A,outcome=1),
t3.c(A,outcome=1),
t4.c(A,outcome=1),
columns=c("test", "user.self", "sys.self", "relative"),
replications=1)
test user.self sys.self relative
1 t1(A, outcome = 1) 0.744 0 46.5000
4 t1.c(A, outcome = 1) 0.284 0 17.6875
2 t3(A, outcome = 1) 0.076 0 4.7500
5 t3.c(A, outcome = 1) 0.072 0 4.6250
3 t4(A, outcome = 1) 0.016 0 1.0000
6 t4.c(A, outcome = 1) 0.020 0 1.1250
Here, t4(), t4.c() is faster than t3(), t3.c().
With
Nrow <- 20000
Ncol <- 50
A <- matrix((runif(Ncol*Nrow)<0.2)+0, nrow=Nrow)
i get
test user.self sys.self relative
1 t1(A, outcome = 1) 7.444 0 20.233696
4 t1.c(A, outcome = 1) 2.740 0 7.442935
2 t3(A, outcome = 1) 0.368 0 1.000000
5 t3.c(A, outcome = 1) 0.368 0 1.002717
3 t4(A, outcome = 1) 0.592 0 1.605978
6 t4.c(A, outcome = 1) 0.536 0 1.456522
Here, t3() and t3.c() are faster than t4(), t4.c().
Petr.
More information about the R-help
mailing list