[R] dist like function but where you can configure the method
Witold E Wolski
wewolski at gmail.com
Fri May 16 22:00:15 CEST 2014
Ouch,
First : my question was not how to implement dist but if there is a
more generic dist function than stats:dist.
Secondly: ks.test is ment as a placeholder (see the comment in the
code I did send) for any other function taking two vector arguments.
Third: I do subscribe to the idea that a function call is easier to
read and understand than a for loop. @Bert apply is a native C
function and the loop is not interpreted AFAIK
@Rui @Barry @Jari What do you benchmark? an empty loop?
Look at the trivial benchmarks below: _apply_ clearly outperforms a
for loop in R , It always has, it outperforms even an empty for
# an empty unrealistic for loop as suggested by Rui , Barry and Jari
f1 <- function(n){
for(i in 1:n){
for(j in 1:n){
}
}}
myfunc = function(x,y=x){x-y}
# a for loop which does actually something
f2 <- function(n){
mm <- matrix(0,ncol=n,nrow=n)
for(i in 1:n){
for(j in 1:n){
mm[i,j] = myfunc(i,j)
}
}
return(mm)
}
# and array
f3 = function(n){
res = rep(0,n*n)
for(i in 1:(n*n))
{
res[i] = myfunc(i)
}
}
n = 1000
system.time(f1(n))
system.time(f2(n))
system.time(f3(n))
system.time(apply(t(1:(n*n)),1,myfunc))
> system.time(f1(n))
User System verstrichen
0.28 0.00 0.28
> system.time(f2(n))
User System verstrichen
6.80 0.00 7.09
> system.time(f3(n))
User System verstrichen
5.83 0.00 5.98
> system.time(apply(t(1:(n*n)),1,myfunc))
User System verstrichen
0.19 0.00 0.19
On 16 May 2014 20:55, Rui Barradas <ruipbarradas at sapo.pt> wrote:
> Hello,
>
> The compiler package is good at speeding up for loops but in this case the
> gain is neglectable. The ks test is the real time problem.
>
> library(compiler)
>
> f1 <- function(n){
>
> for(i in 1:100){
> for(i in 1:100){
> ks.test(runif(100),runif(100))
> }
> }
> }
>
> f1.c <- cmpfun(f1)
>
> system.time(f1())
> user system elapsed
> 3.50 0.00 3.53
> system.time(f1.c())
> user system elapsed
> 3.47 0.00 3.48
>
>
> Rui Barradas
>
> Em 16-05-2014 17:12, Barry Rowlingson escreveu:
>>
>> On Fri, May 16, 2014 at 4:46 PM, Witold E Wolski <wewolski at gmail.com>
>> wrote:
>>>
>>> Dear Jari,
>>>
>>> Thanks for your reply...
>>>
>>> The overhead would be
>>> 2 for loops
>>> for(i in 1:dim(x)[2])
>>> for(j in i:dim(x)[2])
>>>
>>> isn't it? Or are you seeing a different way to implement it?
>>>
>>> A for loop is pretty expensive in R. Therefore I am looking for an
>>> implementation similar to apply or lapply were the iteration is made
>>> in native code.
>>
>>
>> No, a for loop is not pretty expensive in R -- at least not compared
>> to doing a k-s test:
>>
>> > system.time(for(i in 1:10000){ks.test(runif(100),runif(100))})
>> user system elapsed
>> 3.680 0.012 3.697
>>
>> 3.68 seconds to do 10000 ks tests (and generate 200 runifs)
>>
>> > system.time(for(i in 1:10000){})
>> user system elapsed
>> 0.000 0.000 0.001
>>
>> 0.000s time to do 10000 loops. Oh lets nest it for fun:
>>
>> > system.time(for(i in 1:100){for(i in
>> 1:100){ks.test(runif(100),runif(100))}})
>> user system elapsed
>> 3.692 0.004 3.701
>>
>> no different. Even a ks-test with only 5 items is taking me 2.2 seconds.
>>
>> Moral: don't worry about the for loops.
>>
>> Barry
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
--
Witold Eryk Wolski
More information about the R-help
mailing list