So I had this statistics homework and I wanted to do it with python and numpy.
The question started with making of 1000 random samples which follow normal distribution.
random_sample=np.random.randn(1000)
Then it wanted to divided these numbers to some subgroups . for example suppose we divide them to five subgroups.first subgroup is random numbers in range of (-5,-3)and it goes on to the last subgroup (3,5).
Is there anyway to do it using numpy (or anything else)?
And If it's possible I want it to work when the number of subgroups are changed.
0
votes
2 Answers
0
votes
You can get subgroup indices using numpy.digitize:
random_sample = 5 * np.random.randn(10)
random_sample
# -> array([-3.99645573, 0.44242061, 8.65191515, -1.62643622, 1.40187879,
# 5.31503683, -4.73614766, 2.00544974, -6.35537813, -7.2970433 ])
indices = np.digitize(random_sample, (-3,-1,1,3))
indices
# -> array([0, 2, 4, 1, 3, 4, 0, 3, 0, 0])
0
votes
If you sort your random_sample, then you can divide this array by finding the indices of the "breakpoint" values — the values closest to the ranges you define, like -3, -5. The code would be something like:
import numpy as np
my_range = [-5,-3,-1,1,3,5] # example of ranges
random_sample = np.random.randn(1000)
hist = np.sort(random_sample)
# argmin() will find index where absolute difference is closest to zero
idx = [np.abs(hist-i).argmin() for i in my_range]
groups=[hist[idx[i]:idx[i+1]] for i in range(len(idx)-1)]
Now groups is a list where each element is an array with all random values within your defined ranges.