[R] memory-efficient column aggregation of a sparse matrix

Jon Stearley jrstear at sandia.gov
Fri Feb 2 00:42:47 CET 2007

On Feb 1, 2007, at 6:22 AM, Douglas Bates wrote:

> It turns out that in the sparse matrix code used by the
> Matrix package the triplet representation allows for duplicate index
> positions with the convention that the resulting value at a position
> is the sum of the values of any triplets with that index pair.

Very handy!  I suggest adding this nugget near the "(possibly  
redundant) triplets" phrase in Matrix.pdf.

> If you decide to use this approach please be aware that the indices
> for the triplet representation in the Matrix package are 0-based (as
> in C code) not 1-based (as in R code).  (I imagine that Martin is
> thinking "we really should change that" as he reads this part.)

The Value of the appended function is equivalent to my previous  
version, but it runs in 1/10'th the time, uses vastly less memory,  
and is fewer lines of code to boot!  Sure it's tricky, but it does  
the trick.



NEWaggregate.csr <- function(x,fac) {
         # cast into handy Matrix sparse Triplet form
         x.T <- as(as(x, "dgRMatrix"), "dgTMatrix")

         # factor column indexes (compensating for 0 vs 1 indexing)
         x.T at j <- as.integer(as.integer(fac[x.T at j+1])-1)

         # cast back, magically computing factor sums along the way :)
         y <- as(x.T, "matrix.csr")

         # and fix the dimension (doing this on x.T bus errors!)
         y at dimension <- as.integer(c(nrow(y),nlevels(fac)))

More information about the R-help mailing list