[R] Summary of colSums

Douglas Bates bates at stat.wisc.edu
Thu Jul 19 17:28:59 CEST 2001

"David Kane  <David Kane" <a296180 at mica.fmr.com> writes:

> However, Doug Bates was kind enough to send me a simple math
> utilities package that has a version of colSums in C. The package
> installed without a hitch and produced similar performance to the S+
> colSums function (in my simple tests).

Actually I called the package MatUtils for "Matrix Utilities".

Some on the list may be interested in how the package was created.
Although the information on how to do this in the "Writing R
Extensions" Manual, available as a PDF file in the doc section on
CRAN, it is good to see a short example.

Because the calculation is simple and we want it to run fast, I used C
code and the .Call interface.  

The first step is to write the R function colSums.

> colSums <- function(m) .Call("colSums", as.matrix(m))

In R-1.3.0 you can use the package.skeleton function to create the
skeleton of a package directory.

> package.skeleton(name = "MatUtils", list(colSums))

This creates a directory called MatUtils along with the R, man, data,
and src subdirectories and the DESCRIPTION, README R/colSums.R and
man/colSums.Rd files.

I then added src/colSums.c
--- begin colSums.c ---
#include <R.h>
#include <Rdefines.h>

SEXP colSums(SEXP m)
    int i, j, *mdims, n, p;
    double *mm, sum;
    SEXP val;

    if (!isMatrix(m)) error("m must be a matrix");
    mdims = INTEGER(coerceVector(getAttrib(m, R_DimSymbol), INTSXP));
    n = mdims[0]; p = mdims[1];
    PROTECT(val = allocVector(REALSXP, p));
    mm = REAL(m);
    for (j = 0; j < p; j++) {
        sum = 0.;
        for (i = 0; i < n; i++) sum += mm[i];
        REAL(val)[j] = sum;
        mm += n;
    return val;
--- end colSums.c ---
edited the DESCRIPTION file so it reads
--- begin DESCRIPTION ---
Package: MatUtils
Title: Some utility functions for Matrices
Version: 1.0
Author: Douglas Bates <bates at stat.wisc.edu>
Description: Matrix Utility functions such as colSums
Maintainer: Douglas Bates <bates at stat.wisc.edu>
License: GPL version 2 or later
--- end DESCRIPTION ---
edited man/colSums.Rd to read
--- begin man/colSums.Rd ---
\title{Compute the column sums of a matrix}
  \code{colSums} computes the column sums of a matrix
  \item{m}{a numeric matrix.}
  a numeric vector of the column sums of the matrix \code{m}
\author{Douglas Bates \email{bates at stat.wisc.edu}}
colSums(matrix(1:20, ncol = 4))
--- end colSums.Rd ---
and - this is important - added a file R/zzz.R with contents
--- begin R/zzz.R ---
.First.lib <- function(lib, pkg) {
    library.dynam("MatUtils", pkg, lib)
--- end R/zzz.R ---

I think I would have forgotten to write the .First.lib function but
Thomas has thoughtfully provided a README file in the src directory
that reminds you to do this.

On Linux/Unix one can then cd to the parent directory of MatUtils and

R CMD build --force MatUtils   # this creates the INDEX file
R CMD check MatUtils

If the check fails (and, in fact, I did have mistakes in the C code in
the first version I tried to compile) then go through the usual
edit/debug cycle.

As for timings, you wrote (and I hope it is ok to send this to the list)

 Thanks for the answer and the package. It installed without a hitch on Solaris
 2.6. Here is a summary of the *very* brief comparison that I did.

 > library(MatUtils)
 > m <- matrix(rnorm(10000000), 100000, 100)
 > system.time(a1 <- apply(m, 2, sum))
 [1] 6.77 0.00 6.87 0.00 0.00
 > system.time(a2 <- colSums(m))
 [1] 0.43 0.00 0.43 0.00 0.00
 > all.equal(a1, a2)
 [1] TRUE

 Compare these to the results in S+ 6.0.

 > m <- matrix(rnorm(10000000), 100000, 100)
 > sys.time(a1 <- apply(m, 2, sum))
 [1] 18.74 26.09
 > sys.time(a2 <- colSums(m))
 [1] 0.49 0.55
 > all.equal(a1, a2)
 [1] T
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch

More information about the R-help mailing list