0
votes

I got one question about the numpy.histogram option normed, the function is:

numpy.histogram(a, bins=10, range=None, normed=False, weights=None, density=None)

By the definition: numpy.histogram

normed : bool, optional

This keyword is deprecated in Numpy 1.6 due to confusing/buggy behavior. It will be removed in Numpy 2.0. Use the density keyword instead. If False, the result will contain the number of samples in each bin. If True, the result is the value of the probability density function at the bin, normalized such that the integral over the range is 1. Note that this latter behavior is known to be buggy with unequal bin widths; use density instead. weights : array_like, optional

I try it with this code:

   imhist, bins = histogram([0,1,2,3], bins=4, normed=True)
   print "normed=True:", imhist
   print "bins:", bins
   Output:
   normal=True: [ 0.33333333  0.33333333  0.33333333  0.33333333]
   bins: [ 0.    0.75  1.5   2.25  3.  ]

   imhist, bins = histogram([0,1,2,3], bins=4)
   print "normed=None:", imhist
   print "bins:", bins
   Output:
   normal=None: [1 1 1 1]
   bins: [ 0.    0.75  1.5   2.25  3.  ]

What I am feel confused is about the when normed=True, "the result is the value of the probability density function at the bin, normalized such that the integral over the range is 1." Because I thought the imhist should be this:

   normal=True: [0.25  0.25  0.25  0.25]

The 4 values equally drop in 4 bins, and that is why "normal=None: [1 1 1 1]"

   Value:[ 0         1      2      3   ]
   bins: [ 0.    0.75  1.5   2.25  3.  ]

I have refer to this How does numpy.histogram() work? post, but it does use the normed=True option.

2
Are you confusing "integral over the range" with "sum"?DSM
@DSM I am confused with the imhist it return, it should be [0.25 0.25 0.25 0.25] but not [ 0.33333333 0.33333333 0.33333333 0.33333333], How come it returns this? 0.33333333 = 1/3, that looks like a probability for one value out of 3 values.Liao Zhuodi

2 Answers

2
votes

The docs didn't say that it would return values that summed to 1, they said

If True, the result is the value of the probability density function at the bin, normalized such that the integral over the range is 1.

So in your case, it's not that imhist should be [0.25]*4, but:

>>> imhist
array([ 0.33333333,  0.33333333,  0.33333333,  0.33333333])
>>> imhist * np.diff(bins)
array([ 0.25,  0.25,  0.25,  0.25])
>>> (imhist * np.diff(bins)).sum()
1.0

That's the invariant you get. Whenever you change the bins you'll change those values.

0
votes
numpy.histogram(input, bins=10, density=True)

Using density = True would do the following at the backend.

1.Firstly, based on the bin width and the minimum and maximum values in a, it will first calculate a certain bin width and then create a histogram where X axis would be a and Y axis would be the number of inputs.

2.Next it will calculate the relative frequency for each data point i.e it will divide the number of each data point by total number of data points. These are relative frequencies and can also be interpreted as probability values. This interpretation is based on the concept of law of large numbers

3.The Y values in any PDF´s are not actual probabilities, but are probability density. So if you divide the relative frequencies by the bin width, we would get same result as the one which we get just by using density =True parameter