[R] Rmpi performance
Thomas Lumley
tlumley at u.washington.edu
Fri Oct 13 18:39:54 CEST 2006
On Fri, 13 Oct 2006, Michela Cameletti wrote:
> Dear R users,
> we are trying to do some parallel computing using library(snow).
> In particular we have a cluster with 3 nodes
>
>> cl <- makeCluster(3, type = "MPI")
> 3 slaves are spawned successfully. 0 failed.
>
>
> and we want to compute the function op_mat (see below) first with the
> master and then with the cluster using system.time for checking the
> computational performance.
>
> op_mat = function(mat) {
>
> + inv = solve(mat)
> + det_inv = det(inversa)
> + tr_inv = sum(diag(inversa))
> + return(list(c(det=det_inv,tr=tr_inv)))
> + }
>
>> nn = 3000
>> XX = matrix(rnorm(nn*nn),nn,nn)
> # with the master
>> system.time(op_matrici(XX))
> [1] 42.283 1.883 44.168 0.000 0.000
> # with the cluster
>> system.time(clusterCall(cl,op_matrici,XX))
> [1] 11.523 12.612 71.562 0.000 0.000
>
> You can see that using the master it takes 44.168 seconds for computing
> the function on matrix XX while it takes 71.562 seconds (more time!!!)
> with the cluster. Can you give us some advice in order to understand why
> the cluster is slower than the master?
clusterCall() evaluates the same call on each computer in the cluster, so
it will always be slower than just evaluating on the master. It is
useful for setup that has to be performed on each machine, or for parallel
evaluation of random functions (eg boostrapping, simulation)
To split up a single computation you have to do it explicitly, eg with
parLapply, parSapply, and parApply, or parMM for parallel matrix
multiplication. It's unlikely that you could speed up inverting a dense
matrix even with gigabit ethernet for communication -- the success of
ATLAS and Dr Goto's tuned BLAS libraries shows that the time taken for
dense linear algebra can be dominated by communications overhead even
between a CPU and its own memory.
-thomas
More information about the R-help
mailing list