Ok, here is quick way to solution (if you want to use truncated gaussian). Set boundaries and desired stddev. I assume mean is 0. Then quick-and-crude code to do binary search for distribution sigma
, solving for non-linear root (brentq()
should be used in production code). All formulas are taken from Wiki page on Truncated Normal. It (sigma) shall be larger than desired stddev due to the fact, that truncation removes random values which contribute to large stddev. Then we do quick sampling test - and mean and stddev are close to desired values but never exactly equal to them. Code (Python-3.7, Anaconda, Win10 x64)
import numpy as np
from scipy.special import erf
from scipy.stats import truncnorm
def alpha(a, sigma):
return a/sigma
def beta(b, sigma):
return b/sigma
def xi(x, sigma):
return x/sigma
def fi(xi):
return 1.0/np.sqrt(2.0*np.pi) * np.exp(-0.5*xi*xi)
def Fi(x):
return 0.5*(1.0 + erf(x/np.sqrt(2.0)))
def Z(al, be):
return Fi(be) - Fi(al)
def Variance(sigma, a, b):
al = alpha(a, sigma)
be = beta(b, sigma)
ZZ = Z(al, be)
return sigma*sigma*(1.0 + (al*fi(al) - be*fi(be))/ZZ - ((fi(al)-fi(be))/ZZ)**2)
def stddev(sigma, a, b):
return np.sqrt(Variance(sigma, a, b))
m = 0.0 # mean
s = 1.0 # this is what we want
a = -3.0 # left boundary
b = 3.0 # right boundary
#print(stddev(s , a, b))
#print(stddev(s + 0.1, a, b))
slo = 1.0
shi = 1.1
stdlo = stddev(slo, a, b)
stdhi = stddev(shi, a, b)
sigma = -1.0
while True: # binary search for sigma
sme = (slo + shi) / 2.0
stdme = stddev(sme, a, b)
if stdme - s == 0.0:
sigma = stdme
elif stdme - s < 0.0:
slo = sme
shi = sme
if shi - slo < 0.0000001:
sigma = (shi + slo) / 2.0
print(sigma) # we got it, shall be slightly bigger than s, desired stddev
rvs = truncnorm.rvs(a, b, loc=m, scale=sigma, size=1000000) # quick sampling test
For me it printed
sigma = 1.0153870105743408
mean = -0.000400729471992301
stddev = 1.0024267696681475
with different seed or sequence length you might get output like
a mean of 0
? Distribution mean value is different from sample mean value - if I start sampling from simple gaussian, N(0,1), no bounds, then even if distribution mean is 0, sampling mean would be different and would be closing on 0 when number of samples is going to infinity. Do you want distribution mean to be zero? Or you want sample mean (actually, sum) of any sampled sequence to be exactly zero all the time? Those two conditions are quite different – Severin Pappadeux