R thinks the integral is probably divergent when it is not

Question

I am trying to use R to evaluate an integral.

Here is the function I am trying to evaluate:

#Define the marginal likelihood function (which will we then integrate)
marginal = function(z, 
                    se,
                    alpha, 
                    beta,
                    noise){
    log_result = log( 1 / ( ( z^-1*se - 4 )^.5 * z^-1*se ^ ( 3 / 2 ) ) ) +
    log( ( dbeta( ( 0.5 - ( ( z^-1*se - 4 ) ^ .5 ) / ( 2 * ( z^-1*se ) ^ .5 ) ) , alpha, beta ) +  
             dbeta( ( 0.5 + ( ( z^-1*se - 4 ) ^ .5 ) / ( 2 * ( z^-1*se ) ^ .5 ) ) , alpha, beta ) ) ) + 
    dgamma( z, shape = 1 / noise, scale = noise, log = TRUE) +
    log( z^-1 )
    
    return(exp(log_result))
}

integrate( marginal, lower = 0, upper = 0.9584705, se = 3.833882, alpha = 4.568112, beta = 4.830437, noise = 1e-4)$value

However, doing this results in my integral being thought to be divergent. This isn't the case, and furthermore, for all values between 0 and my upper bound at ~0.96, the function can be evaluated.

I took a look at what the function looks like and here is an example:

Very interesting, if I decide to change the lower bound to something like 0.8, I now get a finite answer, 0.0001505723. Clearly what is happening is that because my function is essentially at zero for so long and only peaks at the boundary R is doing something weird here when calculating the integral.

Unfortunately because I am doing these integrals over different values of alpha/beta/noise/se etc, the extent of this lower bound changes, and sometimes the integral is peaked like it is in the attached image and sometimes it is peaked in the middle of the boundary (where the upper bound changes based on the data input) so I cannot simply choose to change the lower bound to always be 0.8. For computational reasons I would also prefer to not have to write a test to see where this lower bound should be, as that feels quite ad-hoc to me and not a suitable solution for my purposes.

Is there some other better way to evaluate this integral?

R does very simple numeric integration which under certain extreme cases like this one may not be the best way to calculate values. There's no computational method that's perfect for every function I'm afraid. — MrFlick
Can you recommend (ideally in R) a solution to evaluate this integral then? — euler
Can you come up with a reasonable algorithmic strategy for identifying approximately where the lower bound should be? (I know you said you don't want to have to "write a test" for this, but if there is some reasonably theoretically grounded approach based on the shape of the function ...) — Ben Bolker
@BenBolker the lower bound will always be at 0. But effectively sometimes this means that the function is at zero for a long period of time until it reached some inflection point and then the peak happens — euler

Ben Bolker Ben Bolker · Accepted Answer · 2021-03-29T20:36:54

This is a fundamentally difficult problem (but tl;dr, pracma::integral() seems to do a reasonable job): see e.g. answers to this question.

If it is at all possible for you to gain an analytical understanding of the function so that you can do a reasonably direct computation based on the parameters to determine a reasonable lower bound (i.e. over approximately what range does the function have non-trivial mass?) that would be the most robust approach — but of course it requires detailed analysis of your specific function (which I'm not bothering to do ...)
For some cases, but I think not this one, I would suggest recasting the numerical computation to minimize numerical instability/underflow. Even something as simple as multiplying by a large constant would often work, but in this case I think it's not going to work because your function has such a wide range (i.e. if we figured how to get the lower range to be greater than the underflow level for floating-point values (approx 1e-308), then the upper range would be greater than the overflow level (approx 1e308). There are ways of computing the log sum of exponentials, but we probably don't want to get into that here (PS: I tried it with matrixStats::logsumexp, but either I got something wrong or it doesn't work perfectly either ...)

curve(marginal(x, se = 3.833882, alpha = 4.568112, beta = 4.830437, 
noise = 1e-4), from=0, to =1, n=501, log="y")

One hacky solution is to divide your integral into a piecewise sum; as long as integrate() gets all of the non-trivial components right, you're good to go. However, note that this strategy is not robust either: for cuts<=5 it says the integral is divergent, and for cuts>=8 it misses a non-trivial component.

piecewise_int <- function(fun, lower, upper, cuts=10, ...) {
    values <- numeric(cuts+1)
    cvec <- seq(lower, upper, length.out=cuts+1)
    for (i in seq(cuts)) {
        values[i] <- integrate(fun, lower=cvec[i], upper=cvec[i+1],
                               ...)$value
    }
    print(values)
    return(sum(values))
}
piecewise_int( marginal, lower = 0, upper = 0.9584705, se = 3.833882, 
               alpha = 4.568112, beta = 4.830437, noise = 1e-4, cuts=7)

Finally, pracma::integral() seems to do a better job — but I don't think there is any guarantee that it will always succeed, or even necessarily that it will always do better than one of the strategies above ...

pracma::integral( marginal, xmin = 0, xmax = 0.9584705, se = 3.833882, 
                  alpha = 4.568112, beta = 4.830437, noise = 1e-4)

R thinks the integral is probably divergent when it is not

1 Answers