0
votes

I have a DataFrame and would like to make grouped boxplots for a selection of data with certain labels (list boxplots). The boxplot should show values and add a line showing the average value for the values in each group of boxplots.

import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randint(0,30,size=(100, 4)), columns=list('ABCD'))
label = ['A','B','C','D','E','F']
df['label'] = np.random.choice(label, df.shape[0])
boxplots = ['A', 'D']

I can't really figure out how to make grouped boxplots? Do I iterate through the boxplots list and then add them to a plot in each iteration?

Any thoughts are much appreciated!

1
Just realized: I made a mistake: I meant barchart NOT boxplotuser9118870
You can edit your question.Lambda

1 Answers

1
votes

plotting the bars is not that hard usingisin and groupby

df_selection = df[df['label'].isin(boxplots)]

df_sum = df_selection.groupby('label').sum()
df_mean = df_sum.mean(axis=1)
line_data = [(i-.3, i+.3, value) for i, (label, value) in enumerate(df_mean.iteritems()) ]
x_min, x_max, y = zip(*line_data)

ax = df_sum.plot.bar()
ax = ax.hlines(y, x_min, x_max, linewidth=2, color='k')

plot