[R] Vectorizing integrate()
Berend Hasselman
bhh at xs4all.nl
Fri Dec 7 18:40:50 CET 2012
On 07-12-2012, at 18:12, Spencer Graves wrote:
> Has anyone suggested using the byte code compiler "compiler" package? An analysis by John Nash suggested to me that it may be roughly equivalent to vectorization; see "http://rwiki.sciviews.org/doku.php?id=tips:rqcasestudy&s=compiler".
>
>
Not yet.
But here are some results for alternative ways of doing what the OP wanted.
# Initial parameters
N <- 1000
B <- c(0,1)
sem1 <- runif(N, 1, 2)
x <- rnorm(N)
X <- cbind(1, x)
# load compiler package
library(compiler)
# functions
# Original loop solution with function fun defined outside loop
f1 <- function(X, B, x, sem1) {
eta <- numeric(nrow(X))
fun <- function(u, m, s) 1/ (1 + exp(- (B[1] + B[2] * (m + u)))) * dnorm(u, 0, s)
for(j in 1:nrow(X)){
eta[j] <- integrate(fun, -Inf, Inf, m=x[j], s=sem1[j])$value
}
eta
}
f2 <- cmpfun(f1)
# sapply solution with fun defined outside function
fun <- function(u, m, s) 1/ (1 + exp(- (B[1] + B[2] * (m + u)))) * dnorm(u, 0, s)
f3 <- function(X, B, x, sem1) sapply(1:nrow(X), function(i) integrate(fun, -Inf, Inf, m=x[i], s=sem1[i])$value)
f4 <- cmpfun(f3)
# sapply solution with fun defined within function
f5 <- function(X, B, x, sem1) {
fun <- function(u, m, s) 1/ (1 + exp(- (B[1] + B[2] * (m + u)))) * dnorm(u, 0, s)
sapply(1:nrow(X), function(i) integrate(fun, -Inf, Inf, m=x[i], s=sem1[i])$value)
}
f6 <- cmpfun(f5)
# Testing
eta1 <- f1(X, B, x, sem1)
eta2 <- f2(X, B, x, sem1)
eta3 <- f3(X, B, x, sem1)
eta4 <- f4(X, B, x, sem1)
eta5 <- f5(X, B, x, sem1)
eta6 <- f6(X, B, x, sem1)
identical(eta1,eta2)
identical(eta1,eta3)
identical(eta1,eta4)
identical(eta1,eta5)
library(rbenchmark)
benchmark(eta1 <- f1(X, B, x, sem1), eta2 <- f2(X, B, x, sem1), eta3 <- f3(X, B, x, sem1),
eta4 <- f4(X, B, x, sem1), eta5 <- f5(X, B, x, sem1), eta6 <- f6(X, B, x, sem1),
replications=10, columns=c("test","elapsed","relative"))
# Results
> identical(eta1,eta2)
[1] TRUE
> identical(eta1,eta3)
[1] TRUE
> identical(eta1,eta4)
[1] TRUE
> identical(eta1,eta5)
[1] TRUE
>
> library(rbenchmark)
>
> benchmark(eta1 <- f1(X, B, x, sem1), eta2 <- f2(X, B, x, sem1), eta3 <- f3(X, B, x, sem1),
+ eta4 <- f4(X, B, x, sem1), eta5 <- f5(X, B, x, sem1), eta6 <- f6(X, B, x, sem1),
+ replications=10, columns=c("test","elapsed","relative"))
test elapsed relative
1 eta1 <- f1(X, B, x, sem1) 1.873 1.207
2 eta2 <- f2(X, B, x, sem1) 1.552 1.000
3 eta3 <- f3(X, B, x, sem1) 1.807 1.164
4 eta4 <- f4(X, B, x, sem1) 1.841 1.186
5 eta5 <- f5(X, B, x, sem1) 1.852 1.193
6 eta6 <- f6(X, B, x, sem1) 1.601 1.032
As you can see using the compiler package is beneficial speedwise.
f2 and f6, both the the result of using the compiler package, are the quickest.
It's quite likely that more can be eked out of this.
Berend
More information about the R-help
mailing list