[R] No speed up using the parallel package and ncpus > 1 with boot() on linux machines
Jeff Newmiller
jdnewmil at dcn.davis.ca.us
Sat Oct 17 19:28:12 CEST 2015
None of this is surprising. If the calculations you divide your work up
into are small, then the overhead of communicating between parallel
processes will be a relatively large penalty to pay. You have to break
your problem up into larger chunks and depend on vector processing within
processes to keep the cpu busy doing useful work.
Also, I am not aware of any model of Mac Mini that has 8 physical cores...
4 is the max. Virtual cores gain a logical simplification of
multiprocessing but do not offer actual improved performance because
there are only as many physical data paths and registers as there are
cores.
Note that your problems are with long-running simulations... your examples
are too small to demonstrate the actual balance of processing vs.
communication overhead. Before you draw conclusions, try upping bootReps
by a few orders of magnitude, and run your test code a couple
of times to stabilize the memory conditions and obtain some consistency
in timings.
I have never used the parallel option in the boot package before... I have
always rolled my own to allow me to decide how much work to do within the
worker processes before returning from them. (This is particularly severe
when using snow, but not necessarily something you can neglect with
multicore.)
On Sat, 17 Oct 2015, Chris Evans wrote:
> I think I am failing to understand how boot() uses the parallel package on linux machines, using R 3.2.2 on three different machines with 2, 4 and 8 cores all results in a slow down if I use "multicore" and "ncpus". Here's the code that creates a very simple reproducible example:
>
> bootReps <- 500
> seed <- 12345
> set.seed(seed)
> require(boot)
> dat <- rnorm(500)
> bootMean <- function(dat,ind) {
> mean(dat[ind])
> }
> start.time <- proc.time()
> bootDat <- boot(dat,bootMean,bootReps)
> boot.ci(bootDat,type="norm")
> stop.time <- proc.time()
> elapsed.time1 <- stop.time - start.time
> require(parallel)
> set.seed(seed)
> start.time <- proc.time()
> bootDat <- boot(dat,bootMean,bootReps,
> parallel="multicore",
> ncpus=2)
> boot.ci(bootDat,type="norm")
> stop.time <- proc.time()
> elapsed.time2 <- stop.time - start.time
> elapsed.time1
> elapsed.time2
>
> Running that on my old Dell Latitude E6500 running Debian Squeeze and
> using 32 bit R 3.2.2 gives me:
>
>> bootReps <- 500
>> seed <- 12345
>> set.seed(seed)
>> require(boot)
>> dat <- rnorm(500)
>> bootMean <- function(dat,ind) {
> + mean(dat[ind])
> + }
>> start.time <- proc.time()
>> bootDat <- boot(dat,bootMean,bootReps)
>> boot.ci(bootDat,type="norm")
> BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
> Based on 500 bootstrap replicates
>
> CALL :
> boot.ci(boot.out = bootDat, type = "norm")
>
> Intervals :
> Level Normal
> 95% (-0.0034, 0.1677 )
> Calculations and Intervals on Original Scale
>> stop.time <- proc.time()
>> elapsed.time1 <- stop.time - start.time
>> require(parallel)
>> set.seed(seed)
>> start.time <- proc.time()
>> bootDat <- boot(dat,bootMean,bootReps,
> + parallel="multicore",
> + ncpus=2)
>> boot.ci(bootDat,type="norm")
> BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
> Based on 500 bootstrap replicates
>
> CALL :
> boot.ci(boot.out = bootDat, type = "norm")
>
> Intervals :
> Level Normal
> 95% (-0.0030, 0.1675 )
> Calculations and Intervals on Original Scale
>> stop.time <- proc.time()
>> elapsed.time2 <- stop.time - start.time
>> elapsed.time1
> user system elapsed
> 0.028 0.000 0.174
>> elapsed.time2
> user system elapsed
> 4.336 2.572 0.166
>
> A very slightly different 95% CI reflecting the way that invoking
> parallel="multicore" changes the seed setting and a huge deterioration
> in execution speed rather than any improvement.
>
> On a more recent four core Toshiba and using ncpus=4 again on Debian
> Squeeze, 32bit R, I get exactly the same CIs and this timing:
>
>> elapsed.time1
> user system elapsed
> 0.032 0.000 0.100
>> elapsed.time2
> user system elapsed
> 0.032 0.020 0.049
>>
>
> and on a Mac Mini with eight cores on Squeeze but with 64bit R I get the
> same CIs and this timing:
>
>> elapsed.time1
> user system elapsed
> 0.012 0.004 0.017
>> elapsed.time2
> user system elapsed
> 0.032 0.012 0.024
>
> I am clearly missing something, or perhaps something else is choking the work, not the CPU power, RAM? I've tried searching for similar reports on the web and was surprised to find nothing using what seemed plausible search strategies.
>
> Anyone able to help me? I'd desperately like to get a marked speed up for some simulation work I'm doing on the Mac mini as it's taking days to run at the moment. The computational intensive bits in the models is a bit more complicated than this here (!) but most of the workload will be in the bootstrapping and the function I'm bootstrapping for real, although it's a bit more complex than a simple mean, isn't that complex though it does involve a stratified bootstrap rather than a simple one. I see very similar marginal speed _losses_ invoking more than one core for that work just as with this very simple example.
>
> TIA,
>
> Chris
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
---------------------------------------------------------------------------
Jeff Newmiller The ..... ..... Go Live...
DCN:<jdnewmil at dcn.davis.ca.us> Basics: ##.#. ##.#. Live Go...
Live: OO#.. Dead: OO#.. Playing
Research Engineer (Solar/Batteries O.O#. #.O#. with
/Software/Embedded Controllers) .OO#. .OO#. rocks...1k
More information about the R-help
mailing list