[R] incrementing matrix elements more efficiently

Romain Francois romain.francois at dbmail.com
Sun Aug 15 09:34:11 CEST 2010


Le 15/08/10 02:43, david h shanabrook a écrit :
> I need to increment cells of a matrix (collusionM).  The indexes to increment are in an index (matchIndex). This is sample  code
>
>     library(seqinr)
>     library(Matrix)
>     x<- "abcabcabc"
>     mx<- s2c(x)
>     collisionM<- Matrix(0,nrow=10, ncol=10, sparse=TRUE)
>     matchIndex<- list()
>     rows<- length(mx)
>     for (j in 1:(rows)) matchIndex[[j]]<- which(mx[j]==mx)
>     for (j in 1:(rows)) collisionM[j,matchIndex[[j]]]<- collisionM[j,matchIndex[[j]]] + 1
>
>    Works fine, except with my data (rows=32000) it is too slow.  The problem is the second for loop, where it increments the index of the sparse matrix; this needs to be rewritten so it is more efficient.  Any ideas?

Hi,

Nice exercise to demonstrate matrix indexing. First let's look at it on 
a simpler example:

 > x <- matrix( 1:16, nr = 4 )
 > x
      [,1] [,2] [,3] [,4]
[1,]    1    5    9   13
[2,]    2    6   10   14
[3,]    3    7   11   15
[4,]    4    8   12   16
 > y <- cbind( c(1,2,3,4), c(1,2,3,4) )
 > y
      [,1] [,2]
[1,]    1    1
[2,]    2    2
[3,]    3    3
[4,]    4    4
 > x[ y ]
[1]  1  6 11 16

Now we can apply it to your example, below f is your code and g uses 
matrix indexing:

f <- function(x = "abcabcabc"){
	mx <- s2c(x)
	rows <- length(mx)
	collisionM <- Matrix(0,nrow=rows, ncol=rows, sparse=TRUE)
	matchIndex <- list()
	for (j in 1:(rows)) matchIndex[[j]] <- which(mx[j]==mx)
	for (j in 1:(rows)) collisionM[j,matchIndex[[j]]] <- 
collisionM[j,matchIndex[[j]]] + 1
	collisionM
}

g <- function( x = "abcabcabc"){
	mx <- s2c(x)
	rows <- length(mx)
	collisionM <- Matrix(0,nrow=rows, ncol=rows, sparse=TRUE)
	
	matchIndex <- do.call( rbind, lapply( 1:rows, function(	i){
		cbind( i, which( mx[i] == mx ) )
	} ) )
	collisionM[ matchIndex ] <- collisionM[matchIndex] + 1
	collisionM
}

# first check it works as expected :
 > all( f() == g() )
[1] TRUE

# then do some timings
 > long <- paste( sample(letters, 1000, replace = TRUE ), collapse = "" )
 > system.time( f( x = long ) )
    user  system elapsed
   8.022   1.052   9.074
 > system.time( g( x = long ) )
    user  system elapsed
   0.079   0.003   0.082



... and it seems we don't even need indexing at all, we can just create 
the matrix using sparseMatrix :

h <- function( x = "abcabcabc"){
	mx <- s2c(x)
	rows <- length(mx)
	matchIndex <- do.call( rbind, lapply( 1:rows, function(	i){
		cbind( i, which( mx[i] == mx ) )
	} ) )
	sparseMatrix( matchIndex[,1], matchIndex[,2] )
}

which gives some improvement :

 > system.time( h( x = long ) )
    user  system elapsed
   0.048   0.000   0.048

Romain


-- 
Romain Francois
Professional R Enthusiast
+33(0) 6 28 91 30 30
http://romainfrancois.blog.free.fr
|- http://bit.ly/bzoWrs : Rcpp svn revision 2000
|- http://bit.ly/b8VNE2 : Rcpp at LondonR, oct 5th
`- http://bit.ly/aAyra4 : highlight 0.2-2



More information about the R-help mailing list