[R] embarrassingly parallel problem - simple loop solution

Martin Morgan mtmorgan at fhcrc.org
Fri Jul 11 03:51:12 CEST 2008

Hi Chris --

"Chris Gaiteri" <gaiteri at gmail.com> writes:

> I have an "embarrassingly parallel" routine that I need to run 24000^2/2
> times (based on some microarray data).  All I really need to do is
> parallelize a nested for-loop.  But I haven't found a clear list of what
> packages/commands I'd need to do this.  I've got a dual quad core xeon

Any of snow / Rmpi / nws / rpvm (the former has system requirements,
the latter three additional software requirements) provide the basic
embarrassingly parallel functionality via variants of lapply, e.g.,

Vectorized ATLAS (search for ATLAS in the R Installation and
Administration Guide) and the experimental package pnmath (see a
thread (oops, pun) starting in June with subject Parallel R, for
instance) provide parallelism at a finer grain, i.e., the level of
linear algebra (ATLAS) or R's math library (pnmath).

> system running RHEL5, so if I could use hyperthreading to increase the
> number of (virtual) nodes that would be great too.

The snow-like solutions allow you to launch as many instances of R as
you like (e.g., one per CPU); each operates quasi-independently. Each
instance of R uses it's own memory, and for big memory problems this
might limit the number of instances per machine.

ATLAS / pnmath make much better use of resources and work without code
modification. But these solutions only provide benefit when the
calculations are appropriately numerical; many calculations are not
formulated in a way that would take advantage of this.

A recent post from Prof. Ripley also mentions the benefits that come
from building R with compiler flags tuned to your chip, but I'm not
able to locate the thread at the moment.

If you're coming at this from scratch, on a Linux-based system, then
snow is probably the easiest to get going, using 'socket'-based
clusters.  I use Rmpi and, to a lesser extent, pnmath. Both at least
in part because I'm interested in the C-level implementations (MPI and
openMP, respectively).


> Appreciate the help.
> Chris
> 	[[alternative HTML version deleted]]
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

Martin Morgan
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M2 B169
Phone: (206) 667-2793

More information about the R-help mailing list