0
votes

I am trying to make bar plot in python using name, prop, total. The idea is I should have name and then if I can show total streams and what proportion are male.

I have following example data

NAME    prop_male    total 
GGD     0.254147    727240
CCG     0.216658    323510
PPT     0.265414    251023
MMMA    0.185105    210416
JKK     0.434557    201594
BBD     0.279319    198998
KNL.    0.277761    190246
TSK     0.277653    171030
LIS     0.218444    165168
BRK     0.44755     161124

I tried this but somehow I,m missing trick

import pandas as pd import seaborn as sns

x, y, hue = "name", "proportion", "total"

(df[x]
 .groupby(df[hue])
 .value_counts(normalize=True)
 .rename(y)
 .reset_index()
 .pipe((sns.barplot, "data"), x=x, y=y, hue=hue))

could someone suggest/help a meaningful plot where I can show all 3 information together.

Thanks in advance

2

2 Answers

2
votes

There is an infinite number of ways to plot these information, however the scale of the columns is quite different if you want to summarise it in a bar chart (a visible one).

The best way is probably what was suggested by Mr. T and the plot looks really nice (i'd add a legend however to explain that the dark blue bar is the male counts while the light blue is the total).

For completeness i'll report other two options which give a less interpretable results ():

You can scale the "total" column to make it visible, You can do a scatter plot

import matplotlib.pyplot as plt
import matplotlib
import numpy as np

Name = ['GGD', 'CCG', 'PPT', 'MMMA', 'JKK', 'BBD',  'KNL']
prop_male = [0.254147, 0.216658, 0.265414, 0.185105, 0.434557, 0.279319, 
0.277761]
total = [727240, 323510, 251023, 210416, 201594, 198998,  190246]

#Plot as bar

x = np.arange(len(Name))  # the label locations
width = 0.35  # the width of the bars

fig, ax = plt.subplots(1,2, figsize=(20,8))
rects1 = ax[0].bar(x - width/2, [float(i)/1e6 for i in total], width, 
             label=r'Total $\times$ 1e-6 ')
rects2 = ax[0].bar(x + width/2, prop_male, width, label='Prop_male')

ax[0].set_xticks(x)
ax[0].set_xticklabels(Name, size=15)
ax[0].legend()

ax[0].set_ylabel("Counts [a.u.]", size=15)

#plot as scatter

norm = matplotlib.colors.Normalize(vmin=0,vmax=len(Name))
mapper = matplotlib.cm.ScalarMappable(norm=norm, cmap='viridis')
colors = np.array([(mapper.to_rgba(v)) for v in range(len(Name))])

for x, y, c in zip(prop_male, total, colors):
    ax[1].plot(x, y, 'o', color=c, markersize=10, alpha=0.8)

cmap = plt.get_cmap('viridis',len(Name))

sm = plt.cm.ScalarMappable(cmap=cmap, norm=norm)
sm.set_array([])
cbar = plt.colorbar(sm, ticks=np.linspace(0,len(Name),len(Name)))
cbar.ax.set_yticklabels(Name)
cbar.set_label('Name', size=15)

ax[1].set_xlabel("prop_male", size=15)
ax[1].set_ylabel("total", size=15)

The plot should be something like this

enter image description here

2
votes

Various ways to achieve this. One would be to calculate the number of males and plot the bars ontop of each other:

import pandas as pd
from matplotlib import pyplot as plt
import seaborn as sns

df = pd.DataFrame({"name": list("ABC"), "proportion": [0.2, 0.7, 0.1], "total": [123, 321, 213]})
df["male"] = df.proportion * df.total

ax = sns.barplot(data=df, x="name", y="total", color="lightblue")
sns.barplot(data=df, x="name", y="male", color="blue", ax=ax)
ax.set_ylabel("male/total")
plt.show()

Sample output: enter image description here

The hue parameter in seaborn is usually a classification category in long-form data. To illustrate this statement, here a sample code:

import pandas as pd
from matplotlib import pyplot as plt
import seaborn as sns

df = pd.DataFrame({"name": list("ABC"), "proportion": [0.2, 0.7, 0.1], "total": [123, 321, 213]})
df["male"] = df.proportion * df.total

#transform the data from wide to long form
df_plot = df.melt(id_vars=["name"], value_vars=["male", "total"])
#use the former column names as categories in a barplot
sns.barplot(data=df_plot, x="name", y="value", hue="variable")
plt.show()

Output: enter image description here

You could also decide to present the percentage and the total number separately:

import pandas as pd
from matplotlib import pyplot as plt
import seaborn as sns

df = pd.DataFrame({"name": list("ABC"), "proportion": [0.2, 0.7, 0.1], "total": [123, 321, 213]})

fig, ax1 = plt.subplots()
ax2 = ax1.twinx()
sns.barplot(data=df, x="name", y="total", color="lightblue", ax=ax1)
sns.lineplot(data=df, x="name", y= "proportion", color="black", lw=3, ls="--", ax=ax2)
plt.show()

Sample output: enter image description here

Have I mentioned yet that there is more than one way?