Gardar Johannesson gardar at stat.ohio-state.edu
Thu Dec 20 17:54:08 CET 2001

This is a little success story about the benefits of changing
the defaults in config.site when I was building R-1.4.0 for Solaris
(on a Sun Sparc that I'm currently using).

For previous versions of R, I had just used the default config.site and
not given it any thought.  Since the Sun machine that I'm using
is not getting any faster, I decided I would give config.site a look
when building R-1.4.0.

By default, doing './configure' and then 'make' in building R-1.4.0 from
source, results in the following (short summary list) of compiling flags:

BLAS = blas.o
CC = cc
FC = f77

Following suggestions given in R-admin.html, I also build R-1.4.0

BLAS_LIBS = -xlic_lib=sunperf -lsunmath
CC = cc -xarch=v9
CFLAGS = -xO5 -xlibmil -dalign
FC = f95 -xarch=v9
FFLAGS = -xO5 -xlibmil -dalign

I did few tests comparing the speed of these two builds.  In short, I
saw about 65% speed improvement for general use, slightly more for
regression problems (2-3 times), and considerable more in matrix
multiplication (50 times).

Here are the tests.

1) Timing the tests/Examples/base-Ex.R script.  I did the following for
   the two builds:
     time ./bin/R --vanilla < tests/Examples/base-Ex.R > tmp.out
   resulting in the following times:
     R-1.4.0-def: 227.70u 26.88s 4:20.34 97.7%
     R-1.4.0-opt: 138.75u 30.90s 2:57.62 95.5%
   for the default and optimized version, where 227.70u and 138.75u are
   the users CPU time.  That is, the default is about 65% slower.

2) A little MCMC example that I have using a for-loop to generate 10,000
   samples from the posterior:
     R-1.4.0-def: 14.45 sec user CPU
     R-1.4.0-opt:  8.96 sec user CPU
     S-6.0      : 34.19 sec user CPU
   where the last line is from S-plus 6.0 on the same machine.

3) A regression,
     lm(ozone ~ ns(lat.band,df=15) +
                weights=1/var, data=data, na.action=na.omit))
   where data has in one case 3240 rows and in a other case 12960 rows.
   The number of estimated parameters is 166 in both cases.
   For data with 3240 rows:
     R-1.4.0-def: 7.12 sec user CPU time
     R-1.4.0-opt: 2.90 sec user CPU time
     S-6.0      : 3.78 sec user CPU time
   For data with 12960 rows:
     R-1.4.0-def: 28.34 sec user CPU time
     R-1.4.0-opt: 14.97 sec user CPU time
     S-6.0      : 13.70 sec user CPU time

4) The result of system.time(B <- A %*% A) where A is 500x500 matrix.
     R-1.4.0-def: 18.83 sec user CPU time
     R-1.4.0-opt:  0.37 sec user CPU time

I hope this will be of use to somebody... cheers, Gardar

