0
votes

I am trying to random sample n=15 digits from range(low=3, high=7) that when summing the sampled 15 digits, they average to 5.

So far I've managed only to random sample n digits from certain range like:

n = 15
low  = 3
high = 8
range=[i for i in range(low,high)]
list =np.random.choice(range, n)

This generates random 15 digits of integers ranging from 3, 4, 5, 6, 7. However, I want it randomly sampled, yet the resulted average of the n digit sums equal 5? How can I go about this?

2
You cannot have both. If you take a random sample you will get a random average. You can come up with a process to get the numbers you want but it won't be a random sample. What do you need this for?jbch
@jbch Thank you. Yeah, i realized it's not really random if I were to fix to certain average k. I'm trying to make random timing within given range, but the numbers to be somewhat fixed. I want to make sure the numbers are sampled around average k while I have random digits of 3, 4, 5, 6, 7 for later permutations of the list.Vinci
For clarification, I think the question can be refined to sample n digits from range(x,y) that average to mean k (So without random sampling)Vinci

2 Answers

2
votes

I'm not sure this suits your needs but it's one way to do it.

import random

def kind_of_random(low, high, k, n):
    """
    Generate a list of n numbers between low and high
    with a mean of exactly k.
    """  
    assert k < high
    values = [low] * n
    to_add = (k - low) * n
    for _ in range(to_add):
        i = random.randint(0, n-1)
        # Don't want to add to a value that's already the max
        while values[i] == high:
            i = random.randint(0, n-1)
        values[i] += 1
    return values

ns = kind_of_random(3, 7, 5, 15)

I'm sure there are more efficient variations on this idea - I think you could do it faster by starting n * [k] instead of n * [low] and doing some number of paired add/substract operations, but this should be good enough for small n.

1
votes

Well, we could use distribution which outcome naturally sums to known value, and with fixed number of sampled numbers mean would be fixed as well. Mean of 5 and number of samples to be 15 means total sum of number shall be always equal to 75.

Simplest one is Multinomial, so lets use it from NumPy. We set equal probabilities to 1/15, sample in the range [0...30] and reject sampling if any values is above desired range.

It is faster than method proposed by @jbch, no manual balance of sums and means, and distribution histogram is closer to symmetric if you care about it

Code

import numpy as np

def multiSum(n, p, maxv):
    while True:
        v  = np.random.multinomial(n, p, size=1)
        q  = v[0]
        a,  = np.where(q > maxv) # are there any values above max
        if len(a) == 0: # accept only samples below or equal to maxv
            return q

N = 15
p = np.full((N), 1.0/np.float64(N))

mean  = 5
start = 3
stop  = 7
n = N*mean - N*start

h = np.zeros((5), dtype=np.int64)
print(h)
for k in range(0, 10000):
    ns = multiSum(n, p, stop-start) + start # result in [3...7]
    #print(np.mean(ns))
    for v in ns:
        h[v-start] += 1

print(h)

Typical output histogram on my computer

[15698 38107 44584 33719 17892]

@jbch output histogram

[17239 39237 42188 28957 22379]