[R] R parallel - slow speed

Thu Jul 30 14:56:53 CEST 2015

Parallelizing comes at a price... and there is no guarantee that you can afford it. Vectorizing your algorithms is often a better approach. Microbenchmarking  is usually overkill for evaluating parallelizing.

You assume 4 cores... but many CPUs have 2 cores and use hyperthreading to make each core look like two.

The operating system can make a difference also... Windows processes are more expensive to start and communicate between than *nix processes are. In particular, Windows seems to require duplicated RAM pages while *nix can share process RAM (at least until they are written to) so you end up needing more memory and disk paging of virtual memory becomes more likely.
---------------------------------------------------------------------------
Jeff Newmiller                        The     .....       .....  Go Live...
DCN:<jdnewmil at dcn.davis.ca.us>        Basics: ##.#.       ##.#.  Live Go...
                                      Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
/Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k
--------------------------------------------------------------------------- 
Sent from my phone. Please excuse my brevity.

On July 30, 2015 8:26:34 AM EDT, Martin Spindler <Martin.Spindler at gmx.de> wrote:
>Dear all,
>
>I am trying to parallelize the function npnewpar given below. When I am
>comparing an application of "apply" with "parApply" the parallelized
>version seems to be much slower (cf output below). Therefore I would
>like to ask how the function could be parallelized more efficient.
>(With increasing sample size the difference becomes smaller, but I was
>wondering about this big differences and how it could be improved.)
>
>Thank you very much for help in advance!
>
>Best,
>
>Martin
>
>
>library(microbenchmark)
>library(doParallel)
>
>n <- 500
>y <- rnorm(n)
>Xc <- rnorm(n)
>Xd <- sample(c(0,1), replace=TRUE)
>Weights <- diag(n)
>n1 <- 50
>Xeval <- cbind(rnorm(n1), sample(c(0,1), n1, replace=TRUE))
>
>
>detectCores()
>cl <- makeCluster(4)
>registerDoParallel(cl)
>microbenchmark(apply(Xeval, 1, npnewpar, y=y, Xc=Xc, Xd = Xd,
>Weights=Weights, h=0.5),  parApply(cl, Xeval, 1, npnewpar, y=y, Xc=Xc,
>Xd = Xd, Weights=Weights, h=0.5), times=100)
>stopCluster(cl)
>
>
>Unit: milliseconds
>                           expr       min        lq      mean    median
>apply(Xeval, 1, npnewpar, y = y, Xc = Xc, Xd = Xd, Weights = Weights,  
>   h = 0.5)  4.674914  4.726463  5.455323  4.771016
>parApply(cl, Xeval, 1, npnewpar, y = y, Xc = Xc, Xd = Xd, Weights =
>Weights,      h = 0.5) 34.168250 35.434829 56.553296 39.438899
>        uq       max neval
>  4.843324  57.01519   100
> 49.777265 347.77887   100
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>npnewpar <- function(y, Xc, Xd, Weights, h, xeval) {
>  xc <- xeval[1]
>  xd <- xeval[2]
>  l <- function(x,X) {
>    w <-  Weights[x,X]
>    return(w)
>  }
>  u <- (Xc-xc)/h
>  #K <- kernel(u)
>  K <- dnorm(u)
>  L <- l(xd,Xd)
>  nom <- sum(y*K*L)
>  denom <- sum(K*L)
>  ghat <- nom/denom
>  return(ghat)
>}
>
>______________________________________________
>R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.