[R] bug (?) with lapply / clusterMap / clusterApply etc
jacob at forestlidar.org
jacob at forestlidar.org
Tue Mar 22 18:46:13 CET 2016
Hello I have encountered a bug(?) with the parallel package. When run
from within a function, the parLapply function appears to be copying
the entire parent environment (environment of interior of function)
into all child nodes in the cluster, one node at a time - which is
very very slow - and the copied contents are not even accessible
within the child nodes even though they are apparent in the memory
footprint. This happens when parLapply is run from within a function.
I may be misusing the terms "parent" and "node" here...
The below code demonstrates the issue. The same parallel command is
used twice within the function, once before creating a large object,
and once afterwards. Both commands should take a nearly identical
amount of time. Initially the parallel code takes less than 1/100th of
a second, but in the second iteration requires hundreds of times
longer...
Example Code:
#create a cluster of nodes
if(!"clus1" %in% ls()) clus1=makeCluster(10)
#function used to demonstrate bug
rows_fn1=function(x,clus){
#first set of parallel code
print(system.time(parLapply(clus,1:5,function(z){y=rnorm(5000);return(mean(y))})))
#create large vector
x=rnorm(10^7)
#second set
print(system.time(parLapply(clus,1:5,function(z){y=rnorm(5000);return(mean(y))})))
}
#demonstrate bug - watch task manager and see windows slowly
copy the vector to each node in the cluster
rows_fn1(1:5000,clus1)
Although the child nodes bloat proportionally to the size of x in the
parent environment, x is not available in the child nodes. The code
above can be tweaked to add more variables (x1,x2,x3 ...) and the
child nodes will bloat to the same degree.
I am working on Windows Server 2012, I am using 64bit R version 3.2.1.
I upgraded to 3.2.4revised and observed the same bug.
I have googled for this issue and have not encountered any other
individuals having a similar problem.
I have attempted to reboot my machine without effect (aside from the obvious).
Any suggestions would be greatly appreciated!
With regards,
Jacob L Strunk
Forest Biometrician (PhD), Statistician (MSc)
and Data Munger
More information about the R-help
mailing list