[R] loop vs. apply(): strange behavior with data frame?
Roberto Perdisci
roberto.perdisci at gmail.com
Thu Oct 22 02:17:25 CEST 2009
Hi everybody,
I noticed a strange behavior when using loops versus apply() on a data frame.
The example below "explicitly" computes a distance matrix given a
dataset. When the dataset is a matrix, everything works fine. But when
the dataset is a data.frame, the dist.for function written using
nested loops will take a lot longer than the dist.apply
######## USING FOR #######
dist.for <- function(data) {
d <- matrix(0,nrow=nrow(data),ncol=nrow(data))
n <- ncol(data)
r <- nrow(data)
for(i in 1:r) {
for(j in 1:r) {
d[i,j] <- sum(abs(data[i,]-data[j,]))/n
}
}
return(as.dist(d))
}
######## USING APPLY #######
f <- function(data.row,data.rest) {
r2 <- as.double(apply(data.rest,1,g,data.row))
}
g <- function(row2,row1) {
return(sum(abs(row1-row2))/length(row1))
}
dist.apply <- function(data) {
d <- apply(data,1,f,data)
return(as.dist(d))
}
######## TESTING #######
library(mvtnorm)
data <- rmvnorm(100,mean=seq(1,10),sigma=diag(1,nrow=10,ncol=10))
tf <- system.time(df <- dist.for(data))
ta <- system.time(da <- dist.apply(data))
print(paste('diff = ',sum(as.matrix(df) - as.matrix(da))))
print("tf = ")
print(tf)
print("ta = ")
print(ta)
print('----------------------------------')
print('Same experiment on data.frame...')
data2 <- as.data.frame(data)
tf <- system.time(df <- dist.for(data2))
ta <- system.time(da <- dist.apply(data2))
print(paste('diff = ',sum(as.matrix(df) - as.matrix(da))))
print("tf = ")
print(tf)
print("ta = ")
print(ta)
########################
Here is the output I get on my system (R version 2.7.1 on a Debian lenny)
[1] "diff = 0"
[1] "tf = "
user system elapsed
0.088 0.000 0.087
[1] "ta = "
user system elapsed
0.128 0.000 0.128
[1] "----------------------------------"
[1] "Same experiment on data.frame..."
[1] "diff = 0"
[1] "tf = "
user system elapsed
35.031 0.000 35.029
[1] "ta = "
user system elapsed
0.184 0.000 0.185
Could you explain why that happens?
thank you,
regards
Roberto
More information about the R-help
mailing list