0
votes

I have two numpy arrays X and GX (float and int respectively) and I want to bin the X array (and thus corresponding GX values which saves the frequency) and plot a histogram with bins on x-axis and frequency on the y-axis. I have tried using pandas' qcut, cut and matplotlib's histogram as well. None of them seems to work. I have created bins and frequencies with numpy from scratch but all I can get is a scatter plot.

bins   = np.linspace(min(X), max(X),100)
freq   = []
countl = 0
for i in range(len(bins)-1):
    count = 0
    for j in range(len(X)):
        if bins[i]<X[j]<bins[i+1]:
            count += np.sum(GX[np.where(X==X[j])])
    freq.append(count)
for j in X:
    if bins[-2]<j<bins[-1]:
        countl += np.sum(GX[np.where(X==j)])

freq.append(countl)
plt.figure(figsize=(7,7))
plt.scatter(bins,freq,c='b')

Instead of the scatterplot, how can I get the bar graph or histogram (and probably a better method to bin values)?

1

1 Answers

1
votes

With your given code, as you already calculated each of the bins, a histogram is just a bar plot of those bins:

plt.bar(bins, freq, width=bins[1]-bins[0], color='crimson', ec='black')

Note that the test bins[i] < X[j] < bins[i+1] would leave out the X-values that are exactly equal to a bin bound. In most situations such equality is very unlikely, except for the minimum and the maximum of X. Therefore, bins[i] <= X[j] < bins[i+1] would be safer. Also, to fit the very last value, you could extend the bins with just an epsilon: e.g. bins = np.linspace(min(X), max(X)+0.000001, 100) (depending on the magnitude of X making sure epsilon is very small, but not neglected in the smaller than test).

Alternatively, and provided the sum of GX isn't too large to cause memory problems, you could just use np.repeat to repeat the X array using GX as repetition factors. Then, matplotlib can just calculate the histogram in the usual way:

all_X = np.repeat(X, GX)
plt.hist(all_X, bins=100, color='crimson', ec='black')