[R] Has R recently made performance improvements in accumulation?
Brent
yhbrent at yahoo.com
Mon Aug 1 04:20:47 CEST 2016
Thierry: thanks much for your feedback, and apologies for this tardy response.
You pointed me in the right direction. I did not appreciate how even if the algorithm ultimately has O(n^2) behavior, it can take a big n to overcome large coefficents on lower order terms (e.g. the O(1) and O(n) parts).
A quick fix to my original code is to simply have 100 columns in each row instead of 10, and to look at bigger numbers of rows as well:
n = 20
numRows = seq(from = 1*1000, to = 20*1000, length = n)
nCol = 50
execTimes = vector(mode = "numeric", length = n)
for (i in 1:n) {
nRow = numRows[i]
t1 = Sys.time()
mkFrameForLoop(nRow, nCol)
t2 = Sys.time()
execTimes[i] = difftime(t2, t1, units = "secs") # CRITICAL: must use difftime (instead of t2 - t1) to ensure that units are always seconds
}
A simple plot shows obvious nonlinearity now:
plot(numRows, execTimes)
For you guys reading this text email, a human readable table can be gotten from this code
df = data.frame(numRows = numRows, execTimes = execTimes)
df
which yields
numRows execTimes
1 1000 3.564204
2 2000 8.268473
3 3000 14.923853
4 4000 23.506344
5 5000 31.379795
6 6000 43.820506
7 7000 56.720244
8 8000 72.979174
9 9000 97.328567
10 10000 113.404486
11 11000 141.113071
12 12000 145.597327
13 13000 168.967664
14 14000 196.135218
15 15000 219.662564
16 16000 237.763599
17 17000 275.018730
18 18000 305.647482
19 19000 327.215715
20 20000 359.673572
Finally, a quick simple power law fit using
lm( log(execTimes) ~ log(numRows), data = df )
yields
Coefficients:
(Intercept) log(numRows)
-10.065 1.605
(i.e. the power over this range of data is 1.605 which is obviously > 1).
boB Rudis: thanks much for the functional elegance suggestion.
More information about the R-help
mailing list