[R] Speeding up "accumulation" code in large matrix calc?
Petr Savicky
savicky at cs.cas.cz
Fri Feb 24 21:55:46 CET 2012
On Fri, Feb 24, 2012 at 09:06:28PM +0100, Berend Hasselman wrote:
[...]
> I've done some speed tests.
>
> t1 <- function(m, outcome) {
> nrows <- dim(m)[1]
> ncols <- dim(m)[2]
> res <- matrix(rep.int(0, nrows*ncols), nrow=nrows)
> for(row in 1:nrows) {
> for(col in 2:ncols) {
> res[row,col] <- if(m[row,col-1]==outcome) {0} else {1+res[row,col-1]}
> }
> }
> res
> }
>
> # by row
> t2 <- function(A, outcome) {
> oneRow <- function(x, outcome)
> {
> n <- length(x)
> y <- c(0, cumsum(x[-n] == outcome))
> ave(x, y, FUN = function(z) seq.int(along=z) - 1)
> }
>
> t(apply(A, 1, oneRow, outcome=1))
> }
>
> # transformation
> t3 <- function(A, outcome) {
> B <- array(0, dim=dim(A))
> curr <- B[, 1]
> for (i in seq.int(from=2, length=ncol(A)-1)) {
> curr <- ifelse (A[, i-1] == outcome, 0, 1 + curr)
> B[, i] <- curr
> }
> B
> }
>
> # Compile functions to byte code
> library(compiler)
> t1.c <- cmpfun(t1)
> t2.c <- cmpfun(t2)
> t3.c <- cmpfun(t3)
>
> Nrow <- 100
> Ncol <- 1000
> A <- matrix((runif(Ncol*Nrow)<0.2)+0, nrow=Nrow)
>
> z1 <- system.time(res1 <- t1(A,outcome=1))
> z2 <- system.time(res2 <- t2(A,outcome=1))
> z3 <- system.time(res3 <- t3(A, outcome=1))
> z4 <- system.time(res4 <- t1.c(A,outcome=1))
> z5 <- system.time(res5 <- t2.c(A,outcome=1))
> z6 <- system.time(res6 <- t3.c(A, outcome=1))
> print(all(res2==res1))
> print(all(res3==res1))
> print(all(res4==res1))
> print(all(res5==res1))
> print(all(res6==res1))
> print(rbind(z1,z2,z3,z4,z5,z6))
>
[...]
> ------------------------------------------
>
> Summary of results
>
> > Nrow <- 100
> > Ncol <- 1000
> > A <- matrix((runif(Ncol*Nrow)<0.2)+0, nrow=Nrow)
> > print(rbind(z1,z2,z3,z4,z5,z6))
> user.self sys.self elapsed user.child sys.child
> z1 0.543 0.005 0.551 0 0
> z2 0.299 0.004 0.304 0 0
> z3 0.070 0.012 0.082 0 0
> z4 0.112 0.002 0.114 0 0
> z5 0.263 0.007 0.271 0 0
> z6 0.062 0.009 0.070 0 0
>
>
> > Nrow <- 1000
> > Ncol <- 100
> > A <- matrix((runif(Ncol*Nrow)<0.2)+0, nrow=Nrow)
> >
> > z1 <- system.time(res1 <- t1(A,outcome=1))
> > z2 <- system.time(res2 <- t2(A,outcome=1))
> > z3 <- system.time(res3 <- t3(A, outcome=1))
> > z4 <- system.time(res4 <- t1.c(A,outcome=1))
> > z5 <- system.time(res5 <- t2.c(A,outcome=1))
> > z6 <- system.time(res6 <- t3.c(A, outcome=1))
> user.self sys.self elapsed user.child sys.child
> z1 0.543 0.005 0.551 0 0
> z2 0.299 0.004 0.304 0 0
> z3 0.070 0.012 0.082 0 0
> z4 0.112 0.002 0.114 0 0
> z5 0.263 0.007 0.271 0 0
> z6 0.062 0.009 0.070 0 0
[...]
>
> The original function (t1) benefits a lot from compiling to byte code.
> The function that operates on columns (t3) seems to be always the quickest in this experiment.
Thank you for this. Intuition may be wrong.
Function t2() calls ave(), which contains a loop over the groups, so
its efficiency depends also on the number of groups. If the matrix is
generated with fewer 1's, then i obtained
Nrow <- 100
Ncol <- 1000
A <- matrix((runif(Ncol*Nrow)<0.002)+0, nrow=Nrow)
library(rbenchmark)
benchmark(t1(A,outcome=1),
t2(A,outcome=1),
t3(A,outcome=1),
t1.c(A,outcome=1),
t2.c(A,outcome=1),
t3.c(A,outcome=1),
columns=c("test", "user.self", "relative"),
replications=1)
1 t1(A, outcome = 1) 0.776 13.362069
4 t1.c(A, outcome = 1) 0.308 5.275862
2 t2(A, outcome = 1) 0.172 2.965517
5 t2.c(A, outcome = 1) 0.176 3.034483
3 t3(A, outcome = 1) 0.060 1.051724
6 t3.c(A, outcome = 1) 0.056 1.000000
So, the result is slightly better for t2() with probability 0.002
of 1 than with probability 0.2. At least, it is faster than t1.c().
But still, t3() or t3.c() is faster.
Petr.
More information about the R-help
mailing list