python - Split dataframe into relatively even chunks according to length

Question

I have to create a function which would split provided dataframe into chunks of needed size. For instance if dataframe contains 1111 rows, I want to be able to specify chunk size of 400 rows, and get three smaller dataframes with sizes of 400, 400 and 311. Is there a convenience function to do the job? What would be the best way to store and iterate over sliced dataframe?

Example DataFrame

import numpy as np
import pandas as pd

test = pd.concat([pd.Series(np.random.rand(1111)), pd.Series(np.random.rand(1111))], axis = 1)

You can just get the index ranges using test.index[::400] and use this to slice the df: first = test.iloc[:400] second = test.iloc[400:800] third = test.iloc[800] — EdChum
I have more than 50 files with >50k rows, I think I will have to generate additional index in a loop and use df.groupby() — YKY

sinhrks sinhrks · Accepted Answer · 2015-10-27T12:31:35

You can take the floor division of a sequence up to the amount of rows in the dataframe, and use it to groupby splitting the dataframe into equally sized chunks:

n = 400
for g, df in test.groupby(np.arange(len(test)) // n):
    print(df.shape)
# (400, 2)
# (400, 2)
# (311, 2)

python - Split dataframe into relatively even chunks according to length

2 Answers