1
votes

is there a good code to split dataframes into chunks and automatically name each chunk into its own dataframe?

for example, dfmaster has 1000 records. split by 200 and create df1, df2,….df5 any guidance would be much appreciated.

I've looked on other boards and there is no guidance for a function that can automatically create new dataframes.

2
If you're reading your data with pd.read_csv or anything similar, you can use the chunksize-parameter: pandas.pydata.org/pandas-docs/version/0.23/generated/…. You'll make a simple for chunk in pd.read_csv(chunksize=200), and so on. - RoyM

2 Answers

1
votes

Use numpy for splitting:

See example below:

In [2095]: df
Out[2095]: 
     0     1     2    3     4    5     6     7     8     9     10
0  0.25  0.00  0.00  0.0  0.00  0.0  0.94  0.00  0.00  0.63  0.00
1  0.51  0.51   NaN  NaN   NaN  NaN   NaN   NaN   NaN   NaN   NaN
2  0.54  0.54  0.00  0.0  0.63  0.0  0.51  0.54  0.51  1.00  0.51
3  0.81  0.05  0.13  0.7  0.02  NaN   NaN   NaN   NaN   NaN   NaN

In [2096]: np.split(df, 2)
Out[2096]: 
[     0     1    2    3    4    5     6    7    8     9    10
 0  0.25  0.00  0.0  0.0  0.0  0.0  0.94  0.0  0.0  0.63  0.0
 1  0.51  0.51  NaN  NaN  NaN  NaN   NaN  NaN  NaN   NaN  NaN,
      0     1     2    3     4    5     6     7     8    9     10
 2  0.54  0.54  0.00  0.0  0.63  0.0  0.51  0.54  0.51  1.0  0.51
 3  0.81  0.05  0.13  0.7  0.02  NaN   NaN   NaN   NaN  NaN   NaN]

df gets split into 2 dataframes having 2 rows each.

You can do np.split(df, 500)

0
votes

I find these ideas helpful:

solution via list: https://stackoverflow.com/a/49563326/10396469

solution using numpy.split: https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.split.html

just use df = df.values first to convert from dataframe to numpy.array.