1
votes

I'm working with a csv file that I've read into pandas using the following command:

RawData = pd.read_csv(rawData_file_path, engine='python', header=[0,1])

This creates a DataFrame object where rows 1 and 2 are header rows in each column. Something like this:

-------------------------------
|    Group 1   |    Group 2   |
-------------------------------
|   A   |   B  |   A   |  B   |
-------------------------------
|  data | data |  data | data |
-------------------------------
|  data | data |  data | data |
-------------------------------

I'm trying to run a count plot with seaborn (sns.countplot) but am running into issues because the 2nd row header is not being viewed as a header. The column I'm trying to analyze is a simple gender column (male / female). However, based on how the results are laid out, column header looks like this:

row 1: What is your gender? 
row 2: Response 
row n: Male or Female etc.

I try to plot this using the countplot:

sns.countplot(x=['What is your gender?'], data=RawData)

However, I get this error: ValueError: The truth value of a DataFrame is ambiguous.

Use a.empty, a.bool(), a.item(), a.any() or a.all().

When I flattened the dataframe, the seaborn plot worked, but instead of mapping Male and Female counts, it mapped Male, Female and 'Response' counts. Which has led me to believe that the 2nd row of the header is what is causing the Value Error in the unflattend DataFrame.

This is the first plot of many I will have to make, and some of the latter columns are more complex and will require that 2nd row as a reference in the header. As such, I can't simply flatten the DataFrame.

Can anyone suggest a work around here? I'd like to nip this in the bud now, with a simple count plot, before I have to start the more complex visualizations such as heatmaps, etc.

1

1 Answers

1
votes

Seaborn functions like countplot assume that you have tidy data. Briefly: each variable should be a column, and each observation should be a row. You will want to find a way to format your dataframe so that it is in this basic structure, and then you will be able to use seaborn to plot it.