1
votes

I am having issues using Pandas and Pyplot to produce a bar chart. I am trying to use the mean of a column for the bar chart y-axis / bar height with the x-axis being a bar for each gender.

I can plot a bar chart with gender displaying correctly by call the column with gender as the x, but when I call just the column with fare as the y, the plot fails. When I call df.mean() of the fare column, the bars plot but at the same height (total mean for fare).

What I am trying to do is to get the bar height = mean of the fare for that gender.

import pandas as pd                                 # import pandas package (install via settings first)
from matplotlib import pyplot as plt                # import pyplot package (install via settings first)

# train_df pulls from a .CSV file

train_embarkS_survive = train_df.filter(['Sex', 'Embarked', 'Fare', 'Survived'])    
train_embarkS_survive = train_embarkS_survive.query('Embarked == "S" and Survived == 1')

plt.figure('Q13: ')
plt.bar(train_embarkS_survive['Sex'], train_embarkS_survive['Fare'].mean(axis=0)) 
plt.xlabel('Sex')                                                
plt.ylabel('Fare')                                               
plt.title('Embarked = S | Survived = 1')    
plt.show()  

The plt.bar of my filtered dataframe using the 'Sex' column (categorical variable with male, female unique values) and the mean of the 'Fare' Column produces a bar chart with equal bar heights (the mean of all fares, not just those for each category male, female).

enter image description here

In actuality, the fare mean for female = 44.60, male = 30.37. How can I get these calculated means as the respective bar heights?

I have tried using groupby() but plt.bar would not accept

train_embarkS_survive.groupby(['Sex']).mean()

For the y-axis argument.

1
Can you show a sample of your data?Fabio Mendes Soares
kaggle.com/c/titanic/data Data is the test.csv located on the site. omitted from my code is the dataframe I first used (for other purposes, named train_df)Paul C. Horton

1 Answers

0
votes

You were in the right direction by using the groupby function. I'd save the interim dataframe before plotting the bar chart. The function bar of the pyplot module takes the series for x and y axis, you need to pass them separately, i.e. here they are represented by the index ('female' and 'male') and the aggregated result of Fare mean, which is saved in the column 'Fare'.

df_mean=train_embarkS_survive.groupby(['Sex']).agg({'Fare':'mean'})
plt.bar(df_mean.index,df_mean['Fare'])
plt.show()

The bar chart shows the mean accordingly for male and female. Bar chart for fare mean by sex