[R] dist like function but where you can configure the method

Fri May 16 22:00:15 CEST 2014

Ouch,

First : my question was not how to implement dist but if there is a
more generic dist function than stats:dist.

Secondly: ks.test is ment as a placeholder (see the comment in the
code I did send) for any other function taking two vector arguments.

Third: I do subscribe to the idea that a function call is easier to
read and understand than a for loop. @Bert apply is a native C
function and the loop is not interpreted AFAIK

@Rui @Barry @Jari What do you benchmark? an empty loop?

Look at the trivial benchmarks below: _apply_ clearly outperforms a
for loop in R , It always has, it outperforms even an empty for

# an empty unrealistic for loop as suggested by Rui , Barry and Jari
f1 <- function(n){
  for(i in 1:n){
    for(j in 1:n){
    }
  }}

myfunc = function(x,y=x){x-y}

# a for loop which does actually something
f2 <- function(n){
  mm <- matrix(0,ncol=n,nrow=n)
  for(i in 1:n){
    for(j in 1:n){
      mm[i,j] = myfunc(i,j)
    }
  }
  return(mm)
}

# and array
f3 = function(n){
  res = rep(0,n*n)
  for(i in 1:(n*n))
  {
    res[i] = myfunc(i)
  }
}

n = 1000
system.time(f1(n))
system.time(f2(n))
system.time(f3(n))
system.time(apply(t(1:(n*n)),1,myfunc))

> system.time(f1(n))
       User      System verstrichen
       0.28        0.00        0.28
> system.time(f2(n))
       User      System verstrichen
       6.80        0.00        7.09
> system.time(f3(n))
       User      System verstrichen
       5.83        0.00        5.98
> system.time(apply(t(1:(n*n)),1,myfunc))
       User      System verstrichen
       0.19        0.00        0.19

On 16 May 2014 20:55, Rui Barradas <ruipbarradas at sapo.pt> wrote:
> Hello,
>
> The compiler package is good at speeding up for loops but in this case the
> gain is neglectable. The ks test is the real time problem.
>
> library(compiler)
>
> f1 <- function(n){
>
>         for(i in 1:100){
>                 for(i in 1:100){
>                         ks.test(runif(100),runif(100))
>                 }
>         }
> }
>
> f1.c <- cmpfun(f1)
>
> system.time(f1())
>    user  system elapsed
>    3.50    0.00    3.53
> system.time(f1.c())
>    user  system elapsed
>    3.47    0.00    3.48
>
>
> Rui Barradas
>
> Em 16-05-2014 17:12, Barry Rowlingson escreveu:
>>
>> On Fri, May 16, 2014 at 4:46 PM, Witold E Wolski <wewolski at gmail.com>
>> wrote:
>>>
>>> Dear Jari,
>>>
>>> Thanks for your reply...
>>>
>>> The overhead would be
>>> 2 for loops
>>> for(i in 1:dim(x)[2])
>>> for(j in i:dim(x)[2])
>>>
>>> isn't it? Or are you seeing a different way to implement it?
>>>
>>> A for loop is pretty expensive in R. Therefore I am looking for an
>>> implementation similar to apply or lapply were the iteration is made
>>> in native code.
>>
>>
>> No, a for loop is not pretty expensive in R -- at least not compared
>> to doing a k-s test:
>>
>>   > system.time(for(i in 1:10000){ks.test(runif(100),runif(100))})
>>     user  system elapsed
>>    3.680   0.012   3.697
>>
>>   3.68 seconds to do 10000 ks tests (and generate 200 runifs)
>>
>>   > system.time(for(i in 1:10000){})
>>     user  system elapsed
>>    0.000   0.000   0.001
>>
>>   0.000s time to do 10000 loops. Oh lets nest it for fun:
>>
>>   > system.time(for(i in 1:100){for(i in
>> 1:100){ks.test(runif(100),runif(100))}})
>>     user  system elapsed
>>    3.692   0.004   3.701
>>
>>   no different. Even a ks-test with only 5 items is taking me 2.2 seconds.
>>
>> Moral: don't worry about the for loops.
>>
>> Barry
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>

-- 
Witold Eryk Wolski