8
votes

how to create a dictionary of pandas dataframes, and return the dataframes into excel worksheets?

Hi All,

I am learning pandas and python, and I want to create a dictionary which contains a some dataframes, which I can then run metrics over each dataframe. With each unique cluster name (one of the columns) I would like to create a dataframe (subset of original dataframe.

Then I would like to be able to select it, run metrics over it, putting the results in a new dataframe, and then place the original dataframe (each subset) into a separate worksheet using xlsxwriter python library.

#create dictionary object

    c_dict = {}

#get a list of the unique names

c_dict= data.groupby('Cluster').groups

#create a dictionary of dataframes, one for each cluster

for cluster in c_dict.items():
    df = data[data['Cluster']==cluster
    c_dict[cluster] =df                                                        <<< im getting invalid syntax here

#go through the dictionary and create a worksheet and put the dataframe in it.

for k,v in c_dict.items():
    dataframe = GetDF(k)                                                            <<< creating worksheets and puts the data from the dataframe > worksheet is not working because of invalid syntax when trying to create dataframe dictionary ^^
    dataframe.to_excel(writer,sheet_name=k)
writer.save

#get the dataframe from the dictionary,

GetDF(dictionary_key)
          return c_dict[dictionary_key]
1
There are a lot of problems with your code. You assign c_dist={}, then immediately assign c_dict to a groupby object. You also don't call the save method on writer, you just reference the function. You also check for the equality of c_dict[cluster] to df, it's not clear why that's happening.munk
I have a large dataset. I want to create smaller datasets (dataframes) based on the unique values of the cluster column. In that code I wanted to create the smaller datasets(dataframes) and assign them to a dictionary so I could call them later. How do you call rather than reference the writer object?>yoshiserry
To call a function, you need parenthesis after it with the appropriate argument list. You're not calling the object, you're calling a method on the object. I'm guessing you meant writer.save(). You're probably better off just using the groupby object, where you can reference a group, rather than trying to stick it into a dict.munk
I'd like to learn how to stick it into the dict, but thanks for the details around calling functions. I'll read the docoyoshiserry
so does anyone actually know how to format my dictionary so that I can store dataframes as I have described?yoshiserry

1 Answers

6
votes

I think this is what you're looking for. As I said in the comments, it's probably not the right solution and it's definitely not idomatic for pandas DataFrames.

import pandas as pd

groups = data.groupby('Cluster')

#create a dictionary of dataframes, one for each cluster
c_dict = {k: pd.DataFrame(v) for k, v in groups.groups.iteritems() }

If you want to save this to an excel file, the documentation is here: http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.to_excel.html

There is a nice example at the bottom that will do what you need. Hint: use for k,v in myDict.iteritems() to get keys and values.