0
votes

I have a seaborn countplot which I have included the 'hue parameter' this is how the plot looks like:

enter image description here

Male

Total count for male = 240 669, Total count for active male = 130 856, Total count for churn male = 109 813

M (Active) --- 130856/240669 = 54.4% and M (Churn) --- 109813/240669 =45.6%

Female

Total count for female = 198 408, Total count for active female = 111 107, Total count for churn female = 87 301

So F (Active) --- 111107/198408 = 56% and F(Churn) --- 87301/198408 =44%

I want the total percentage of each gender to total 100% instead of the percentages given in the attached plot.

This is the code i used:

plt.figure(figsize=(10,6))
colours = ['b','red']
ax = sns.countplot(df.GENDER,hue=df['Status'],order = 
df['GENDER'].value_counts().index,palette=colours)
plt.title("GENDER VS STATUS",fontsize=15)
plt.tight_layout()


plt.xticks(fontsize=14)
plt.yticks(fontsize=14)

total = float(len(df))
for p in ax.patches:
    height = p.get_height()
    ax.text(p.get_x()+p.get_width()/2.,
        height + 3,
        '{0:.1%}'.format(height/total),
        ha="center", fontsize=15)



print(df['GENDER'].value_counts(normalize=True))
2

2 Answers

2
votes

In your annotate loop, you have to divide the height by the total number of M/F. Keep in mind that countplot draws the patches grouped by hues. That is to say, the list of patches will be interlaced M hue1/F hue1/M hue2/F hue2, so you can calculate the totals as `[total M, total F, total M, total F] and loop through that at the same time as your patches:

colours = ['b','red']
totals = df['GENDER'].value_counts()
n_hues = df['Status'].unique().size
ax = sns.countplot(df.GENDER,hue=df['Status'],order=totals.index,palette=colours)
plt.title("GENDER VS STATUS",fontsize=15)
plt.tight_layout()


plt.xticks(fontsize=14)
plt.yticks(fontsize=14)

temp_totals = totals.values.tolist()*n_hues
for p,t in zip(ax.patches,temp_totals):
    height = p.get_height()
    ax.text(p.get_x()+p.get_width()/2.,
        height + 3,
        '{0:.1%}'.format(height/t),
        ha="center", fontsize=15)

enter image description here

-1
votes

I always find it easier to separate data aggregation and plotting. So I would propose to first calculate all values you need, then plot a bar plot from those. (No seaborn needed here.)

import matplotlib.pyplot as plt
import pandas as pd

df = pd.DataFrame({"gender" : list("MMMMFFFFFF"),
                   "category" : list("BAABABBAAA")})

piv = df.groupby(["gender", "category"]).size().unstack("category")
grouped_perc = (piv.T / piv.sum(axis=0).values).T

ax = piv.plot.bar()
for bar, val in zip(ax.patches, grouped_perc.T.values.flat):
    ax.annotate('{0:.1%}'.format(val), 
                xy=(bar.get_x()+bar.get_width()/2., bar.get_height()),
                xytext=(0,5), textcoords="offset points", ha="center")

ax.margins(y=0.1)
plt.show()

enter image description here