[R] how to get rid of 2 for-loops and optimize runtime
joris meys
jorismeys at gmail.com
Mon Oct 19 16:12:01 CEST 2009
Hi Ian,
first of all, take a look at the functions sapply, mapply, lapply,
tapply, ... : they are the more efficient way of implementing loops.
Second, could you elaborate a bit further on the data set : the amount
of the month ago, is that one value from another row, or the sum of
all values in the previous month? I saw in your example dataset that
the last month has 2 rows, but couldn't figure out whether that's a
typo or really means something. That's necessary information to
optimize your code. 129s is indeed far too long for a simple action.
Cheers
Joris
On Mon, Oct 19, 2009 at 3:49 PM, Ian Willems
<ian.willems at uz.kuleuven.ac.be> wrote:
> Short: get rid of the loops I use and optimize runtime
>
> Dear all,
>
> I want to calculate for each row the amount of the month ago. I use a matrix with 2100 rows and 22 colums (which is still a very small matrix. nrows of other matrixes can easily be more then 100000)
>
> Table before
> Year month quarter yearmonth Service ... Amount
> 2009 9 Q3 092009 A ... 120
> 2009 9 Q3 092009 B ... 80
> 2009 8 Q3 082009 A ... 40
> 2009 7 Q3 072009 A ... 50
>
> The result I want
> Year month quarter yearmonth Service ... Amount amound_lastmonth
> 2009 9 Q3 092009 A ... 120 40
> 2009 9 Q3 092009 B ... 80 ...
> 2009 8 Q3 082009 A ... 40 50
> 2009 7 Q3 072009 A ... 50 ...
>
> Table is not exactly the same but gives a good idea what I have and what I want
>
> The code I have written (see below) does what I want but it is very very slow. It takes 129s for 400 rows. And the time gets four times higher each time I double the amount of rows.
> I'm new in programming in R, but I found that you can use Rprof and summaryRprof to analyse your code (output see below)
> But I don't really understand the output
> I guess I need code that requires linear time and need to get rid of the 2 for loops.
> can someone help me or tell me what else I can do to optimize my runtime
>
> I use R 2.9.2
> windows Xp service pack3
>
> Thank you in advance
>
> Best regards,
>
> Willems Ian
>
>
> *****************************
> dataset[,5]= month
> dataset[,3]= year
> dataset[,22]= amount
> dataset[,14]= servicetype
>
> [CODE]
> #for each row of the matrix check if each row has..
>> for (j in 1:Number_rows) {
> + sum<-0
> + for(i in 1:Number_rows){
> + if (dataset[j,14]== dataset[i,14]) #..the same service type
> + {if (dataset[j,18]== dataset[i,18]) # .. the same department
> + {if (dataset[j,5]== "1") # if month=1, month ago is 12 and year is -1
> + {if ("12"== dataset[i,5])
> + {if ((dataset[j,3]-1)== dataset[i,3])
> +
> + { sum<-sum + dataset[i,22]}
> + }}
> + else {
> + if ((dataset[j,5]-1)== dataset[i,5]) " if month != 1, month ago is month -1
> + { if (dataset[j,3]== dataset[i,3])
> + {sum<-sum + dataset[i,22]}
> + }}}}}}
>
> [\Code]
>
>> summaryRprof()
> $by.self
> self.time self.pct total.time total.pct
> [.data.frame 33.92 26.2 80.90 62.5
> NextMethod 12.68 9.8 12.68 9.8
> [.factor 8.60 6.6 18.36 14.2
> Ops.factor 8.10 6.3 40.08 31.0
> sort.int 6.82 5.3 13.70 10.6
> [ 6.70 5.2 85.44 66.0
> names 6.54 5.1 6.54 5.1
> length 5.66 4.4 5.66 4.4
> == 5.04 3.9 44.92 34.7
> levels 4.80 3.7 5.56 4.3
> is.na 4.24 3.3 4.24 3.3
> dim 3.66 2.8 3.66 2.8
> switch 3.60 2.8 3.80 2.9
> vector 2.68 2.1 8.02 6.2
> inherits 1.90 1.5 1.90 1.5
> any 1.68 1.3 1.68 1.3
> noNA.levels 1.46 1.1 7.84 6.1
> .Call 1.40 1.1 1.40 1.1
> ! 1.26 1.0 1.26 1.0
> attr<- 1.06 0.8 1.06 0.8
> .subset 1.00 0.8 1.00 0.8
> class<- 0.82 0.6 0.82 0.6
> != 0.80 0.6 0.80 0.6
> levels.default 0.68 0.5 0.76 0.6
> all 0.62 0.5 0.62 0.5
> < 0.54 0.4 0.54 0.4
> - 0.48 0.4 0.48 0.4
> is.factor 0.44 0.3 2.34 1.8
> .subset2 0.38 0.3 0.38 0.3
> attr 0.36 0.3 0.36 0.3
> is.character 0.28 0.2 0.28 0.2
> is.null 0.28 0.2 0.28 0.2
> | 0.26 0.2 0.26 0.2
> oldClass<- 0.20 0.2 0.20 0.2
> is.atomic 0.16 0.1 0.16 0.1
> nzchar 0.10 0.1 0.10 0.1
> is.numeric 0.06 0.0 0.06 0.0
> oldClass 0.06 0.0 0.06 0.0
> ( 0.04 0.0 0.04 0.0
> [.data 0.02 0.0 0.02 0.0
>
> $by.total
> total.time total.pct self.time self.pct
> [ 85.44 66.0 6.70 5.2
> [.data.frame 80.90 62.5 33.92 26.2
> == 44.92 34.7 5.04 3.9
> Ops.factor 40.08 31.0 8.10 6.3
> [.factor 18.36 14.2 8.60 6.6
> sort.int 13.70 10.6 6.82 5.3
> NextMethod 12.68 9.8 12.68 9.8
> vector 8.02 6.2 2.68 2.1
> noNA.levels 7.84 6.1 1.46 1.1
> names 6.54 5.1 6.54 5.1
> length 5.66 4.4 5.66 4.4
> levels 5.56 4.3 4.80 3.7
> is.na 4.24 3.3 4.24 3.3
> switch 3.80 2.9 3.60 2.8
> dim 3.66 2.8 3.66 2.8
> is.factor 2.34 1.8 0.44 0.3
> inherits 1.90 1.5 1.90 1.5
> any 1.68 1.3 1.68 1.3
> .Call 1.40 1.1 1.40 1.1
> ! 1.26 1.0 1.26 1.0
> attr<- 1.06 0.8 1.06 0.8
> .subset 1.00 0.8 1.00 0.8
> class<- 0.82 0.6 0.82 0.6
> != 0.80 0.6 0.80 0.6
> levels.default 0.76 0.6 0.68 0.5
> all 0.62 0.5 0.62 0.5
> < 0.54 0.4 0.54 0.4
> - 0.48 0.4 0.48 0.4
> .subset2 0.38 0.3 0.38 0.3
> attr 0.36 0.3 0.36 0.3
> is.character 0.28 0.2 0.28 0.2
> is.null 0.28 0.2 0.28 0.2
> | 0.26 0.2 0.26 0.2
> oldClass<- 0.20 0.2 0.20 0.2
> is.atomic 0.16 0.1 0.16 0.1
> nzchar 0.10 0.1 0.10 0.1
> is.numeric 0.06 0.0 0.06 0.0
> oldClass 0.06 0.0 0.06 0.0
> ( 0.04 0.0 0.04 0.0
> [.data 0.02 0.0 0.02 0.0
>
> $sampling.time
> [1] 129.38
>
> Warning message:
> In readLines(filename, n = chunksize) :
> incomplete final line found on 'Rprof.out'
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
More information about the R-help
mailing list