[R] efficient rolling rank
Charles C. Berry
cberry at tajo.ucsd.edu
Sun Apr 18 20:58:33 CEST 2010
On Sun, 18 Apr 2010, zerdna wrote:
>
> Gabor, Charles, Whit -- i've been walking the woods of R alone so far, and i
> got to say that your replies to that trivial question are eye-opening
> experience for me. Gentlemen, what i am trying to say in a roundabout way is
> that i am extremely grateful and that you guys are frigging awesome.
>
> Let me outline the times i am getting for different proposed solutions on
> the same machine, same data, same version of R
>
> x<-rnorm(50000); len<-100
>
> 1. my naive roll.rank
>
> system.time(x.rank.1<-roll.rank(x,len))
> user system elapsed
> 6.405 0.488 6.94
>
> 2. Gabor's zoo
>
> z<-zoo(x)
> system.time(rollapply(z,len, function(x) rank(x)[len]))
> user system elapsed
> 6.195 0.361 6.554
>
> 3. Charles embed
>
> system.time(x.rank <- rowSums(x[ -(1:(len-1)) ] >= embed(x,len) ))
> user system elapsed
> 0.181 0.055 0.236
>
>
> 4. Whit's fts
> dat<-fts(x)
> system.time(x.rank<-moving.rank(dat, len))
> user system elapsed
> 0.036 0 0.036
>
> 5. Charles suggestion with inline, my crude implementation
>
> sig<-signature(x="numeric", rank="integer", n="integer", len="integer")
> code<-"int k=0; for(int i=*len-1; i< *n; i++) {int r=1; for(int j=i-1; j>
> i-len;j--) r+=(x[i]>x[j] ?1:0); rank[k++]<-r;}"
> fns<-cfunction(sig,code, convention=".C")
>
> system.time( x.rank<-fns(x, numeric(length(x)-len), length(x), len))
>
> user system elapsed
> 0.011 0 0.011
>
>
> I guess i could speed it up from time being proportional to length(x)*len
> to time proportional to length(x)*log(len) if i use slightly more
> intelligent algo, but this works fine for my requirements. Only thing i
> really wonder about is why exactly R takes 640 times more than this C code.
> It would be immensely enlightening if someone could point to an explanation
> of how execution in R works and where and when it slows down like this.
Well, you can always read the source code.
But short of that see
?Rprof
then try stuff like this:
> x <- rnorm(50000)
> len <- 100
> Rprof()
> x.rank <- rowSums(x[ -(1:(len-1)) ] >= embed(x,len) )
> Rprof(NULL)
> summaryRprof()
$by.self
self.time self.pct total.time total.pct
embed 0.10 31.2 0.22 68.8
>= 0.08 25.0 0.08 25.0
+ 0.06 18.8 0.06 18.8
- 0.04 12.5 0.04 12.5
rowSums 0.02 6.2 0.32 100.0
rep.int 0.02 6.2 0.02 6.2
inherits 0.00 0.0 0.30 93.8
is.data.frame 0.00 0.0 0.30 93.8
$by.total
total.time total.pct self.time self.pct
rowSums 0.32 100.0 0.02 6.2
inherits 0.30 93.8 0.00 0.0
is.data.frame 0.30 93.8 0.00 0.0
embed 0.22 68.8 0.10 31.2
>= 0.08 25.0 0.08 25.0
+ 0.06 18.8 0.06 18.8
- 0.04 12.5 0.04 12.5
rep.int 0.02 6.2 0.02 6.2
$sampling.time
[1] 0.32
HTH,
Chuck
> --
> View this message in context: http://n4.nabble.com/efficient-rolling-rank-tp2013535p2014922.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
Charles C. Berry (858) 534-2098
Dept of Family/Preventive Medicine
E mailto:cberry at tajo.ucsd.edu UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901
More information about the R-help
mailing list