Background
I need to randomly sample from a fairly complex probability density function (PDF) with a known cumulative distribution function (CDF) and I'm trying to use inverse transform sampling. That should be easy to do, since I have the CDF and just need to numerically invert it (not possible to do algebraically) while plugging in uniform random numbers. However, the resulting distribution has a lower variance than expected and I can not find any mistake in the CDF.
So I simplified and tested my algorithm by sampling from a normal distribution. The result was the same: location is ok, but the scale is wrong. I'm aware that there are better and built-in methods for Gaussian sampling, but this is just a test of the sampling algorithm.
The problem originally arose in Fortran, but I have since replicated the problem in Python, so I have to do something fundamentally wrong or have numerical problems.
Python
import numpy as np
from scipy.special import erf
from scipy.optimize import brentq
import matplotlib.pyplot as plt
from scipy.stats import norm
def testfunc(x):
## Test case, result should be 6.04880103
# out = 0.5 * (1. + erf((x - 5.) / (2. * np.sqrt(2.)))) - 0.7
r = np.random.uniform()
# hand-built cdf:
# out = 0.5 * (1. + erf((x - 5.) / (2. * np.sqrt(2.)))) - r
# scipy cdf:
out = norm.cdf(x, 5, 2) - r
return out
if __name__ == '__main__':
n = 10000
sol_array = np.zeros(n)
for i in range(0, n):
sol_array[i] = brentq(testfunc, -100.,100.)
print('mean = ' + str(np.mean(sol_array)))
print('std = ' + str(np.std(sol_array)))
plt.hist(sol_array, normed=True, bins='fd')
x = np.linspace(-1, 11, 1000)
plt.plot(x, norm.pdf(x, 5, 2))
plt.show()
The mean of the sampled values is about 5, as expected, but the standard deviation is about 1.28, where it should be 2, for both my hand-built CDF and the CDF of scipy.
This is also visible in the histogram:

Fortran
The same problem in Fortran, although with a different value for the resulting standard deviation. The code is longer because the solver is included. That solver is a translated Fortran 90 version by Alan Miller of an old FORTRAN 77 netlib-routine (zeroin.f).
implicit none
integer, parameter :: dp = selected_real_kind(15, 307)
integer, parameter :: n = 1000000
real, dimension(n) :: v
real :: mean, std
integer, dimension(:), allocatable :: seed
integer :: i, seedsize, clock
! seed the PRNG
call random_seed(size=seedsize)
allocate(seed(seedsize))
call system_clock(count=clock)
seed=clock + 37 * (/ (i - 1, i=1, seedsize) /)
call random_seed(put=seed)
deallocate(seed)
do i = 1, n
v(i) = real(zeroin(testfunc, -100._dp, 100._dp, 1e-20_dp, 1e-10_dp))
end do
mean = sum(v) / n
std = sum((v - mean)**2) / n
print*, mean, std
contains
function testfunc(v)
implicit none
real(dp), intent(in) :: v
real(dp) :: testfunc, r
call random_number(r)
! testfunc = 0.5 * (1. + erf((v-5.)/(2.*sqrt(2.)))) - 0.7 ! should be 6.04880
testfunc = 0.5 * (1. + erf((v-5.)/(2.*sqrt(2.)))) - r ! Gaussian test with mu=5 and sigma=2
end function testfunc
function zeroin(f, ax, bx, aerr, rerr) result(fn_val)
! original zeroin.f from netlib.org
! code converted using to_f90 by alan miller
! date: 2003-07-14 time: 12:32:54
!-----------------------------------------------------------------------
! finding a zero of the function f(x) in the interval (ax,bx)
! ------------------------
! INPUT:
! f function subprogram which evaluates f(x) for any x in the
! closed interval (ax,bx). it is assumed that f is continuous,
! and that f(ax) and f(bx) have different signs.
! ax left endpoint of the interval
! bx right endpoint of the interval
! aerr the absolute error tolerance to be satisfied
! rerr the relative error tolerance to be satisfied
! OUTPUT:
! abcissa approximating a zero of f in the interval (ax,bx)
!-----------------------------------------------------------------------
! zeroin is a slightly modified translation of the algol procedure
! zero given by Richard Brent in "Algorithms for Minimization without
! Derivatives", Prentice-Hall, Inc. (1973).
implicit none
real(dp), intent(in) :: ax
real(dp), intent(in) :: bx
real(dp), intent(in) :: aerr
real(dp), intent(in) :: rerr
real(dp) :: fn_val
real(dp) :: a, b, c, d, e, eps, fa, fb, fc, tol, xm, p, q, r, s, atol, rtol
interface
real(selected_real_kind(15, 307)) function f(x)
real(selected_real_kind(15, 307)), intent(in) :: x
end function f
end interface
! compute eps, the relative machine precision
eps = epsilon(0.0_dp)
! initialization
a = ax
b = bx
fa = f(a)
fb = f(b)
if (fb*fa > 0.) then
print*, 'a, b, fa, fb', a, b, fa, fb
stop
end if
atol = 0.5 * aerr
rtol = max(0.5_dp*rerr, 2.0_dp*eps)
! begin step
10 c = a
fc = fa
d = b - a
e = d
20 if (abs(fc) < abs(fb)) then
a = b
b = c
c = a
fa = fb
fb = fc
fc = fa
end if
! convergence test
tol = rtol * max(abs(b),abs(c)) + atol
xm = 0.5 * (c-b)
if (abs(xm) > tol) then
if (fb /= 0.0) then
! is bisection necessary
if (abs(e) >= tol) then
if (abs(fa) > abs(fb)) then
! is quadratic interpolation possible
if (a == c) then
! linear interpolation
s = fb / fc
p = (c-b) * s
q = 1.0 - s
else
! inverse quadratic interpolation
q = fa / fc
r = fb / fc
s = fb / fa
p = s * ((c-b)*q*(q-r)-(b-a)*(r-1.0))
q = (q-1.0) * (r-1.0) * (s-1.0)
end if
! adjust signs
if (p > 0.0) q = -q
p = abs(p)
! is interpolation acceptable
if (2.0*p < (3.0*xm*q-abs(tol*q))) then
if (p < abs(0.5*e*q)) then
e = d
d = p / q
go to 30
end if
end if
end if
end if
! bisection
d = xm
e = d
! complete step
30 a = b
fa = fb
if (abs(d) > tol) b = b + d
if (abs(d) <= tol) b = b + sign(tol,xm)
fb = f(b)
if (fb*(fc/abs(fc)) > 0.0) go to 10
go to 20
end if
end if
! done
fn_val = b
end function zeroin
end
The mean of the resulting samples is about 5, while the standard deviation is about 1.64.
Question
Does anyone have an idea where my algorithm could become numerically problematic? The fact that the Python version and the Fortran version both have the same problem, but to different degrees makes me believe that it's some rounding of floating point numbers problem, but I cannot image where. Even if the solver returns a rounded value, that difference should not show up in a simple histogram.
Does anyone see a mistake in my algorithms? Am I understanding something wrong?