[R] %dopar% parallel processing experiment

ivo welch ivo.welch at gmail.com
Sat Jul 2 20:42:07 CEST 2011


hi uwe--I did not know what snow was.  from my 1 minute reading, it
seems like a much more involved setup that is much more flexible after
the setup cost has been incurred (specifically, allowing use of many
machines).

the attractiveness of the doMC/foreach framework is its simplicity of
installation and use.

but if I understand what you are telling me, you are using a different
parallelization framework, and it shows that my example is completed a
lot faster using this different parallelization framework.  correct?
if so, the problem is my use of the doMC framework, not the inherent
cost of dealing with multiple processes.  is this interpretation
correct?

regards,

/iaw

----
Ivo Welch (ivo.welch at gmail.com)
http://www.ivo-welch.info/


2011/7/2 Uwe Ligges <ligges at statistik.tu-dortmund.de>:
>
>
> On 02.07.2011 20:04, ivo welch wrote:
>>
>> thank you, uwe.  this is a little disappointing.  parallel processing
>> for embarrassingly simple parallel operations--those needing no
>> communication---should be feasible if the thread is not always created
>> and released, but held.  is there light-weight parallel processing
>> that could facilitate this?
>
> Hmmm, now that you asked I checked it myself using snow:
>
> On a some years old 2-core AMD64 machine with R-2.13.0 and snow (using SOCK
> clsuters, i.e. slow communication) I get:
>
>
>
>> system.time(parSapply(cl, 1:A, function(i) uniroot(minfn, c(1e-20,9e20),
>> i)))
>   user  system elapsed
>   3.10    0.19   51.43
>
> while on a single core without parallelization framework:
>
>> system.time(sapply(1:A, function(i) uniroot(minfn, c(1e-20,9e20), i)))
>   user  system elapsed
>  93.74    0.09   94.24
>
> Hence (although my prior assumption was that the overhead would be big also
> for other frameworks than foreach) it scales perfectly well with snow,
> perhaps you have to use foreach in a different way?
>
> Best,
> Uwe Ligges
>
>
>
>
>
>>
>> regards,
>>
>> /iaw
>>
>>
>> 2011/7/2 Uwe Ligges<ligges at statistik.tu-dortmund.de>:
>>>
>>>
>>> On 02.07.2011 19:32, ivo welch wrote:
>>>>
>>>> dear R experts---
>>>>
>>>> I am experimenting with multicore processing, so far with pretty
>>>> disappointing results.  Here is my simple example:
>>>>
>>>> A<- 100000
>>>> randvalues<- abs(rnorm(A))
>>>> minfn<- function( x, i ) { log(abs(x))+x^3+i/A+randvalues[i] }  ## an
>>>> arbitrary function
>>>>
>>>> ARGV<- commandArgs(trailingOnly=TRUE)
>>>>
>>>> if (ARGV[1] == "do-onecore") {
>>>>    library(foreach)
>>>>    discard<- foreach(i = 1:A) %do% uniroot( minfn, c(1e-20,9e20), i ) }
>>>> else
>>>> if (ARGV[1] == "do-multicore") {
>>>>    library(doMC)
>>>>    registerDoMC()
>>>>    cat("You have", getDoParWorkers(), "cores\n")
>>>>    discard<- foreach(i = 1:A) %dopar% uniroot( minfn, c(1e-20,9e20), i )
>>>> }
>>>> else
>>>> if (ARGV[1] == "plain")
>>>>    for (i in 1:A) discard<- uniroot( minfn, c(1e-20,9e20), i ) else
>>>> cat("sorry, but argument", ARGV[1], "is not
>>>> plain|do-onecore|do-multicore\n")
>>>>
>>>>
>>>> on my Mac Pro 3,1 (2 quad-cores), R 2.12.0, which reports 8 cores,
>>>>
>>>>   "plain" takes about 68 seconds (real and user, using the unix timing
>>>> function).
>>>>   "do-onecore" takes about 300 seconds.
>>>>   "do-multicore" takes about 210 seconds real, (300 seconds user).
>>>>
>>>> this seems pretty disappointing.  the cores are not used for the most
>>>> part, either.  feedback appreciated.
>>>
>>>
>>> Feedback is that a single computation within your foreach loop is so
>>> quick
>>> that the overhead of communicating data and results between processes
>>> costs
>>> more time than the actual evaluation, hence you are faster with a single
>>> process.
>>>
>>> What you should do is:
>>>
>>> write code that does, e.g., 10000 iterations within 10 other iterations
>>> and
>>> just do a foreach loop around the outer 10. Then you will probably be
>>> much
>>> faster (without testing). But this is essentially the example I am using
>>> for
>>> teaching to show when not to do parallel processing.....
>>>
>>> Best,
>>> Uwe Ligges
>>>
>>>
>>>
>>>
>>>
>>>
>>>> /iaw
>>>>
>>>>
>>>> ----
>>>> Ivo Welch (ivo.welch at gmail.com)
>>>>
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>



More information about the R-help mailing list