[R] Effect of an optimized BLAS library on lm() function

Wed Aug 3 21:37:13 CEST 2016

Dear All,

My question is simple (but the answer perhaps less): how does R lm()
function benefit from an optimized BLAS library (such as ATLAS, OpenBLAS, or
the MKL library in R Open)? I suppose that these BLAS optimizations
concentrate on the level-3 BLAS, and lm() function relies on level-1 BLAS
LINPACK… Nevertheless, the following test on my laptop seems to indicate
slighter faster calculations with lm() when using MKL library.

> set.seed(123)

> m <- 100000

> n <- 100

> matest <- matrix(rnorm(m*n,0,2), m, n)

> y <- rnorm(m) 

> x <- cbind(1, matest)

> test <- data.frame(matest, y = y)

> n <- names(test)

> f <- as.formula(paste("y ~", paste(n[!n %in% "y"], collapse = " + ")))

> bigtest <- as.big.matrix(cbind(matest, y))

> options(bigmemory.allow.dimnames=TRUE)

> colnames(bigtest) <- names(test)

> compare <- microbenchmark(lm(y ~ ., data=test), lm.fit(x,y), biglm(f,
data=test),

+ bigglm(f, data=test, family=gaussian()), speedglm(y ~ ., data=test,
family=gaussian()),

+ fastLm(y ~ ., data=test, family=gaussian()),

+ biglm.big.matrix(f, data=bigtest, chunksize=100000),

+ solve(crossprod(x), crossprod(x,y)),

+ solve(t(x)%*%x, t(x)%*%y),

+ qr.coef(qr(x, LAPACK = T), y),

+ qr.coef(qr(x, LAPACK = F), y), times=30, unit="ms")

# on R base

> compare

Unit: milliseconds

                                                   expr       min        lq
mean    median        uq       max neval

                                 lm(y ~ ., data = test) 2423.4618 2497.0986
2587.6717 2528.4108 2605.8664 3245.0846    30

                                           lm.fit(x, y) 1785.2536 1816.0466
1864.9683 1841.6295 1868.1580 2166.6797    30

                                  biglm(f, data = test) 1947.5808 2001.2834
2044.3119 2021.8473 2084.5147 2304.7108    30

            bigglm(f, data = test, family = gaussian()) 4461.9307 4544.2631
4726.1232 4629.1442 4705.2304 6240.9467    30

      speedglm(y ~ ., data = test, family = gaussian()) 1651.4826 1670.5076
1727.1039 1696.9502 1771.4395 1946.8591    30

        fastLm(y ~ ., data = test, family = gaussian()) 3939.3955 4051.2525
4214.4684 4149.4943 4301.5698 5037.1337    30

biglm.big.matrix(f, data = bigtest, chunksize = 1e+05) 2176.9082 2283.2670
2336.6303 2345.2608 2383.1442 2478.3213    30

                   solve(crossprod(x), crossprod(x, y))  839.3229  844.4637
867.8926  856.9103  882.7036  949.5098    30

                          solve(t(x) %*% x, t(x) %*% y) 1436.7063 1464.4559
1531.5375 1494.3093 1532.6056 1916.9786    30

                          qr.coef(qr(x, LAPACK = T), y) 1989.8013 2031.9686
2091.0366 2063.0931 2128.2705 2317.7407    30

                          qr.coef(qr(x, LAPACK = F), y) 1752.9953 1809.3616
1856.6803 1826.0688 1887.2954 2111.4479    30

# on R Open (2 threads)

> getMKLthreads()

[1] 2

> compare

Unit: milliseconds

                                                   expr        min
lq       mean     median        uq       max neval

                                 lm(y ~ ., data = test) 1788.61291
1875.63929 1976.74912 1928.96796 2082.2125 2371.3407    30

                                           lm.fit(x, y) 1097.40423
1194.82479 1256.65020 1263.27286 1299.7869 1451.5524    30

                                  biglm(f, data = test) 1936.05859
1983.08455 2041.44009 2019.54653 2103.5658 2211.2873    30

            bigglm(f, data = test, family = gaussian()) 4707.12674
4795.69165 5005.35978 5023.22328 5193.1482 5340.9577    30

      speedglm(y ~ ., data = test, family = gaussian())  881.47945
934.77104  996.59551  975.93888 1038.1470 1212.8197    30

        fastLm(y ~ ., data = test, family = gaussian()) 1671.48208
1810.42245 1883.39515 1880.31249 1936.2377 2324.2279    30

biglm.big.matrix(f, data = bigtest, chunksize = 1e+05) 2196.43003 2266.86705
2339.93722 2355.17538 2401.8879 2585.1069    30

                   solve(crossprod(x), crossprod(x, y))   63.50803
70.25601   92.00605   74.73306  101.0346  363.4992    30

                          solve(t(x) %*% x, t(x) %*% y)  269.83961
298.29031  340.11531  320.85419  350.4334  541.5970    30

                          qr.coef(qr(x, LAPACK = T), y)  864.66990
941.90109 1000.73358  985.70631 1030.5733 1341.8595    30

                          qr.coef(qr(x, LAPACK = F), y) 1155.82456
1253.15555 1309.30060 1299.67349 1354.4523 1556.7010    30

# on R Open (1 thread)

> getMKLthreads()

[1] 1

> compare

Unit: milliseconds

                                                   expr       min        lq
mean    median        uq       max neval

                                 lm(y ~ ., data = test) 1991.4313 2034.4098
2111.0498 2104.9173 2182.6686 2379.2731    30

                                           lm.fit(x, y) 1329.0017 1347.4155
1437.7724 1409.5693 1519.5183 1690.3965    30

                                  biglm(f, data = test) 1922.7935 1979.4146
2030.7355 2008.3577 2066.5151 2265.3957    30

            bigglm(f, data = test, family = gaussian()) 4474.7666 4534.6946
4589.0903 4594.6761 4637.1702 4744.8987    30

      speedglm(y ~ ., data = test, family = gaussian())  918.9388  959.5755
1015.1099 1000.7063 1066.6677 1178.5648    30

        fastLm(y ~ ., data = test, family = gaussian()) 1737.4001 1785.1347
1908.2960 1905.7776 1992.8247 2188.0665    30

biglm.big.matrix(f, data = bigtest, chunksize = 1e+05) 2172.3621 2242.7618
2309.5848 2317.5166 2362.6432 2434.8259    30

                   solve(crossprod(x), crossprod(x, y))  122.5138  125.0064
132.2125  128.8345  137.1886  157.8529    30

                          solve(t(x) %*% x, t(x) %*% y)  330.3578  336.0520
390.2391  379.9801  413.4324  530.6566    30

                          qr.coef(qr(x, LAPACK = T), y)  917.1596  943.9504
987.5824  971.1059 1001.2241 1231.8204    30

                          qr.coef(qr(x, LAPACK = F), y) 1313.7414 1348.3933
1439.4807 1480.5955 1518.4831 1569.0909    30

We notice that solve(crossprod(x), crossprod(x, y)) is faster than
solve(t(x) %*% x, t(x) %*% y), because the function crossprod(x) exploits
the fact that the matrix t(x) %*% x is symmetric to calculate only half of
the matrix.
We notice that solve(t(x) %*% x, t(x) %*% y) is faster than qr.coef(qr(x),
y), because the function solve() relies on the Cholevsky decomposition,
which is approximately twice as fast as the QR decomposition.
We notice that qr.coef(qr(x, LAPACK = T), y) is faster than qr.coef(qr(x,
LAPACK = F), y), at least with an optimized BLAS library which allows a
saving of time with regard to the use of LINPACK.
We notice that the calculation time of qr.coef(qr(x, LAPACK = F), y) is of
the same order as that of lm.fit() which also uses a QR decomposition based
on LINPACK.
Of course, the calculation time of lm() is more important than that of
lm.fit(), because lm() has to estimate the formula and make additional
calculations.
We also notice that the functions biglm() and bigglm() of package biglm do
not benefit from the optimized library BLAS MKL, but that all other
functions benefit from it, including lm() and lm.fit(), and even with only
one thread.

Sincerely,

Stéphane

	[[alternative HTML version deleted]]