[R] How to speed up nested for loop computations
Max Manfrin
mmanfrin at ulb.ac.be
Thu Aug 10 22:10:53 CEST 2006
On 10 Aug 2006, at 18:46, jim holtman wrote:
> It appears that you are trying to partition the dataframe and then
> do some operations. It is probably better to use 'split' to
> generate the set of indices of the partitions and then do the
> operations on the subset. Here is an example that calculate the
> 'mean' of each partition:
>
> > n <- 20
> > x <- data.frame(id=sample(1:3,n,TRUE), type=sample(1:3,n,TRUE),
> value=runif(n))
> > x.split <- split(1:nrow(x), list(x$id, x$type), drop=TRUE)
> > x.split
> $`3.1`
> [1] 1 15 19
>
> $`1.1`
> [1] 2
... cut ...
> > # calculate the number of values in the partition and their mean
>
> > lapply(x.split, function(z) c(length(z),mean(x$value[z])))
> $`3.1`
> [1] 3.0000000 0.3120459
>
> $`1.1`
> [1] 1.0000000 0.5642638
... cut ...
> You should be able to extend this approach to your data.
I tried to follow your suggestion. I indeed have to partition the
data frame: my complete set of data contains for each problem
instance ("instance") of a given size (the number of instances of a
given size in the example is 2), for each search algorithm ("idalgo")
(the number of algorithm I'm testing is 78), for each trial ("try")
(I test each algorithm on each instance 30 times) all the best-so-far
solutions value ("best") found by every CPU (my parallel algorithm
runs on 8 CPU) during the duration of the search.
I therefore applied to the res data frame the command
>res.split <- split(res, list(res$instance, res$try, res$idalgo),
drop=TRUE)
For every partition (and I have 4680 partition of the type
instance.try.idalgo) I need to identify the best solution found (so,
among the 8 CPU I need to identify the one with the lowest value of
"best"). Unluckly the split command doesn't give me back the indexes
of the row of res data frame like in your example, but gives me a
"subset" of the res, so I don't know how to write the lapply function
to return the indexes of the rows in res containing the minimum value
of best for the partitions.
I here give an example with a subset of the data:
> optimal_values<-read.table("optimal_values_80.txt",header=TRUE)
> resPIR2OPT<-read.table("parallel_independent_2-
opt_80_800.txt",header=TRUE)
> resSEQ2OPT<-read.table("sequential_2-opt_80_6400.txt",header=TRUE)
> resSEQ22OPT<-read.table("sequential2_2-opt_80_800.txt",header=TRUE)
>
> res<-rbind(resPIR2OPT,resSEQ2OPT,resSEQ22OPT)
> str(res)
`data.frame': 14774 obs. of 11 variables:
$ idalgo : Factor w/ 3 levels "PIR-2opt","SEQ-2opt",..: 1 1 1 1 1 1
1 1 1 1 ...
$ topo : Factor w/ 3 levels "PIR","SEQ","SEQ2": 1 1 1 1 1 1 1 1 1
1 ...
$ schema : Factor w/ 3 levels "PIR","SEQ","SEQ2": 1 1 1 1 1 1 1 1 1
1 ...
$ ls : int 2 2 2 2 2 2 2 2 2 2 ...
$ type : Factor w/ 2 levels "Par","Seq": 1 1 1 1 1 1 1 1 1 1 ...
$ cpu_id : int 0 0 0 0 0 0 0 0 0 0 ...
$ instance : Factor w/ 2 levels "lipa80a","tai80a": 1 1 1 1 1 1 1 1 1
1 ...
$ try : int 1 1 1 1 1 1 1 1 1 1 ...
$ best : int 255289 255250 255209 255112 254991 254971 254969
254897 254893 254892 ...
$ time : num 0.09 0.09 0.09 0.19 1.16 1.49 1.55 1.72 1.78 1.93 ...
$ iteration: int 1 1 1 2 13 18 19 22 23 26 ...
> res.split <- split(res, list(res$instance, res$try, res$idalgo),
drop=TRUE)
> str(res.split)
List of 180
$ lipa80a.1.PIR-2opt :`data.frame': 184 obs. of 11 variables:
..$ idalgo : Factor w/ 3 levels "PIR-2opt","SEQ-2opt",..: 1 1 1
1 1 1 1 1 1 1 ...
..$ topo : Factor w/ 3 levels "PIR","SEQ","SEQ2": 1 1 1 1 1 1
1 1 1 1 ...
..$ schema : Factor w/ 3 levels "PIR","SEQ","SEQ2": 1 1 1 1 1 1
1 1 1 1 ...
..$ ls : int [1:184] 2 2 2 2 2 2 2 2 2 2 ...
..$ type : Factor w/ 2 levels "Par","Seq": 1 1 1 1 1 1 1 1 1
1 ...
..$ cpu_id : int [1:184] 0 0 0 0 0 0 0 0 0 0 ...
..$ instance : Factor w/ 2 levels "lipa80a","tai80a": 1 1 1 1 1 1
1 1 1 1 ...
..$ try : int [1:184] 1 1 1 1 1 1 1 1 1 1 ...
..$ best : int [1:184] 255289 255250 255209 255112 254991
254971 254969 254897 254893 254892 ...
..$ time : num [1:184] 0.09 0.09 0.09 0.19 1.16 1.49 1.55 1.72
1.78 1.93 ...
..$ iteration: int [1:184] 1 1 1 2 13 18 19 22 23 26 ...
$ lipa80a.2.PIR-2opt :`data.frame': 230 obs. of 11 variables:
..$ idalgo : Factor w/ 3 levels "PIR-2opt","SEQ-2opt",..: 1 1 1
1 1 1 1 1 1 1 ...
..$ topo : Factor w/ 3 levels "PIR","SEQ","SEQ2": 1 1 1 1 1 1
1 1 1 1 ...
..$ schema : Factor w/ 3 levels "PIR","SEQ","SEQ2": 1 1 1 1 1 1
1 1 1 1 ...
..$ ls : int [1:230] 2 2 2 2 2 2 2 2 2 2 ...
..$ type : Factor w/ 2 levels "Par","Seq": 1 1 1 1 1 1 1 1 1
1 ...
..$ cpu_id : int [1:230] 0 0 0 0 0 0 0 0 0 0 ...
..$ instance : Factor w/ 2 levels "lipa80a","tai80a": 1 1 1 1 1 1
1 1 1 1 ...
..$ try : int [1:230] 2 2 2 2 2 2 2 2 2 2 ...
..$ best : int [1:230] 255557 255264 255235 255201 255193
255192 255186 255103 254990 254971 ...
..$ time : num [1:230] 0.09 0.09 0.19 0.19 0.37 1.29 1.36 1.36
1.58 1.89 ...
..$ iteration: int [1:230] 1 1 2 2 4 15 16 16 19 24 ...
My question now is: how do I extract from each partition the row with
the minimal best value? I need to boxplot them.
Thanks again in advance for any help anybody could give.
----
Max MANFRIN
http://iridia.ulb.ac.be/~mmanfrin/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: PGP.sig
Type: application/pgp-signature
Size: 194 bytes
Desc: This is a digitally signed message part
Url : https://stat.ethz.ch/pipermail/r-help/attachments/20060810/40afd2e7/attachment.bin
More information about the R-help
mailing list