[R] lost in the SNOW at 4 AM; parallelization confusion...

Eric Rupley erupley at umich.edu
Sat Aug 23 11:04:41 CEST 2008



Apologies at what must be a very basic question, but I have not found  
any clear examples on how to design the following....

I would like to run iterative analysis over several processors.  A toy  
example of the analysis is attached; for a resampling function run 1k  
times, with two different sets of conditioning variables i,j on some  
data vec...

What is the usual way to attack such a problem using snow?  My  
understanding up to this point is that one should:

(1) set the random seed to uncorrelate the processors' actions in  
select()

(2) make a function myfunc(vec,i,j) which returns the item of interest

(3) set up a wrapper which iterates through i,j, and makes the call to  
the cluster

(4) call the cluster using clusterApply(cl,vec, myfunc)....

I must be terribly confused based on the results attached below....any  
advice will be appreciated...


Many thanks,
Best,
Eric

--
  Eric Rupley
  University of Michigan, Museum of Anthropology
  1109 Geddes Ave, Rm. 4013
  Ann Arbor, MI 48109-1079

  erupley at umich.edu
  +1.734.276.8572



# set up
#
# cl <- makeCluster(7)
#	8 slaves are spawned successfully. 0 failed.
#clusterSetupRNG(cl)
#[1] "RNGstream"


vec <- runif(1000,1,100)
d <- NULL; c.j <- NULL;c.i <- NULL

# the toy function

analysis.func <- function (vec,i,j) {
b <- NULL
for (k in c(1:1000)) {
			a <- sample(vec,1000,replace=T) #requires randoms...
			b <- append(b, mean(a))
		}
c <- (sum(b)*j)/i
return(c)
}


# the "analysis"

system.time(for (i in c(2,4)) { # a series of nested iterations...

	for (j in c(5:6)) {

d <-  
append( mean( as.numeric( clusterApply(cl,vec,analysis.func,i,j) ) ) ,  
d)
# this is ugly and contorted; there has to be a better way?
c.j <- append(j, c.j)
c.i <- append(i, c.i)
}
})

#   user  system elapsed
#  9.758   0.291  48.771
#>

# but the old way is faster...

d <- NULL; c.j <- NULL; c.i <- NULL # set up again

system.time(for (i in c(2,4)) { # a series of nested iterations...

	for (j in c(5:6)) {

d <-append( mean( as.numeric( analysis.func(vec,i,j) )) ,d)
# keeping it ugly for timing comparision...
c.j <- append(j, c.j)
c.i <- append(i, c.i)
}
})


#   user  system elapsed
#  0.299   0.002   0.299
#>  # arrgrgrgrgrg!!!

stopCluster(cl)
#[1] 1
sessionInfo()
#R version 2.7.1 (2008-06-23)
#i386-apple-darwin8.10.1
#
#locale:
#en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8
#
#attached base packages:
#[1] stats     graphics  grDevices utils     datasets  methods   base
#
#other attached packages:
#[1] rlecuyer_0.1 boot_1.2-33  snow_0.3-3   Rmpi_0.5-5
#
#loaded via a namespace (and not attached):
#[1] tools_2.7.1
date()
#[1] "Sat Aug 23 04:25:50 2008"
#>
#Too late for a drink. Pity.



More information about the R-help mailing list