1
votes

I have written following code to plot histogram from given set of values in csv file column :

import pandas as pd import matplotlib.pyplot as plt import numpy

class createHistogram():

def __init__(self,csv_file):
    self.csv_file = csv_file

def load_csv(self):
    bin_edge = range(0,100,10)
    tp_data = pd.read_csv(self.csv_file)
    dataframe = pd.DataFrame(tp_data)['tp']
    dataframe.hist(bins=bin_edge)
    plt.show()
    return tp_data

Here I am getting histogram if values are less than 10, 20 ... and so on, but i want it should be

bin_value<=10

10< bin_value <=20

20

I am new to panda module..

1

1 Answers

2
votes

You can use the Pandas native cut which defines bins in the form of intervals.

ser = pd.Series(np.random.randint(1, 100, 50))
bins = range(0, 101, 10)

The pd.cut classifiies the data into bins using Categorical method.

In [4]: pd.cut(ser, bins).cat.categories
Out[4]: 
IntervalIndex([(0, 10], (10, 20], (20, 30], (30, 40], (40, 50], (50, 60], (60, 70], (70, 80], (80, 90], (90, 100]]
              closed='right',
              dtype='interval[int64]')

If you would like to further plot them, it would go something like this:

In [5]: pd.cut(ser, bins).value_counts().plot(kind='bar')
Out[5]: <matplotlib.axes._subplots.AxesSubplot at 0x10e673b70>

Bar plot