[Rd] [r-devel] integrate over an infinite region produces wrong results depending on scaling
Andreï V. Kostyrka
@ndre|@ko@tyrk@ @end|ng |rom un|@|u
Fri Apr 12 16:52:11 CEST 2019
Dear all,
This is the first time I am posting to the r-devel list. On
StackOverflow, they suggested that the strange behaviour of integrate()
was more bug-like. I am providing a short version of the question (full
one with plots: https://stackoverflow.com/q/55639401).
Suppose one wants integrate a function that is just a product of two
density functions (like gamma). The support of the random variable is
(-Inf, 0]. The scale parameter of the distribution is quite small
(around 0.01), so often, the standard integration routine would fail to
integrate a function that is non-zero on a very small section of the
negative line (like [-0.02, -0.01], where it takes huge values, and
almost 0 everywhere else). R’s integrate would often return the machine
epsilon as a result. So I stretch the function around the zero by an
inverse of the scale parameter, compute the integral, and then divide it
by the scale. Sometimes, this re-scaling also failed, so I did both if
the first result was very small.
Today when integration of the rescaled function suddenly yielded a value
of 1.5 instead of 3.5 (not even zero). The MWE is below.
cons <- -0.020374721416129591
sc <- 0.00271245601724757383
sh <- 5.704
f <- function(x, numstab = 1) dgamma(cons - x * numstab, shape = sh,
scale = sc) * dgamma(-x * numstab, shape = sh, scale = sc) * numstab
curve(f, -0.06, 0, n = 501, main = "Unscaled f", bty = "n")
curve(f(x, sc), -0.06 / sc, 0, n = 501, main = "Scaled f", bty = "n")
sum(f(seq(-0.08, 0, 1e-6))) * 1e-6 # Checking by summation: 3.575294
sum(f(seq(-30, 0, 1e-4), numstab = sc)) * 1e-4 # True value, 3.575294
str(integrate(f, -Inf, 0)) # Gives 3.575294
# $ value : num 3.58
# $ abs.error : num 1.71e-06
# $ subdivisions: int 10
str(integrate(f, -Inf, 0, numstab = sc))
# $ value : num 1.5 # What?!
# $ abs.error : num 0.000145 # What?!
# $ subdivisions: int 2
It stop at just two subdivisions! The problem is, I cannot try various
stabilising multipliers for the function because I have to compute this
integral thousands of times for thousands of parameter values on
thousands of sample windows for hundreds on models, so even in the
super-computer cluster, this takes weeks. Besides that, reducing the
rel.tol just to 1e-5 or 1e-6, helped a bit, but I am not sure whether
this guarantees success (and reducing it to 1e-7 slowed down the
computations in some cases). And I have looked at the Fortran code of
the quadrature just to see the integration rule, and was wondering.
How can I make sure that the integration routine will not produce such
wrong results for such a function, and the integration will still be fast?
Yours sincerely,
Andreï V. Kostyrka
More information about the R-devel
mailing list