5
votes

I have a table like:

value    type
10       0
12       1
13       1
14       2

Generate a dummy data:

import numpy as np

value = np.random.randint(1, 20, 10)
type = np.random.choice([0, 1, 2], 10)

I want to accomplish a task in Python 3 with matplotlib (v1.4):

  • plot a histogram of value
  • group by type, i.e. use different colors to differentiate types
  • the position of the "bars" should be "dodge", i.e. side by side
  • since the range of value is small, I would use identity for bins, i.e. the width of a bin is 1

The questions are:

  • how to assign colors to bars based on the values of type and draw colors from colormap (e.g. Accent or other cmap in matplotlib)? I don't want to use named color (i.e. 'b', 'k', 'r')
  • the bars in my histogram overlap each other, how to "dodge" the bars?

Note

  1. I have tried on Seaborn, matplotlib and pandas.plot for two hours and failed to get the desired histogram.
  2. I read the examples and Users' Guide of matplotlib. Surprisingly, I found no tutorial about how to assign colors from colormap.
  3. I have searched on Google but failed to find a succinct example.
  4. I guess one could accomplish the task with matplotlib.pyplot, without import a bunch of modules such as matplotlib.cm, matplotlib.colors.
2

2 Answers

7
votes

For your first question, we can create a dummy column equal to 1, and then generate counts by summing this column, grouped by value and type.

For your second question you can pass the colormap directly into plot using the colormap parameter:

import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.cm as cm
import seaborn
seaborn.set() #make the plots look pretty

df = pd.DataFrame({'value': value, 'type': type})
df['dummy'] = 1
ag = df.groupby(['value','type']).sum().unstack()
ag.columns = ag.columns.droplevel()

ag.plot(kind = 'bar', colormap = cm.Accent, width = 1)
plt.show()

enter image description here

0
votes

Whenever you need to plot a variable grouped by another (using color), seaborn usually provides a more convenient way to do that than matplotlib or pandas. So here is a solution using the seaborn histplot function:

import numpy as np                 # v 1.19.2
import pandas as pd                # v 1.1.3
import matplotlib.pyplot as plt    # v 3.3.2
import seaborn as sns              # v 0.11.0

# Set parameters for random data
rng = np.random.default_rng(seed=1) # random number generator
size = 50
xmin = 1
xmax = 20

# Create random dataframe
df = pd.DataFrame(dict(value = rng.integers(xmin, xmax, size=size),
                       val_type = rng.choice([0, 1, 2], size=size)))

# Create histogram with discrete bins (bin width is 1), colored by type
fig, ax = plt.subplots(figsize=(10,4))
sns.histplot(data=df, x='value', hue='val_type', multiple='dodge', discrete=True,
             edgecolor='white', palette=plt.cm.Accent, alpha=1)

# Create x ticks covering the range of all integer values of df['value']
ax.set_xticks(np.arange(df['value'].min(), df['value'].max()+1))

# Additional formatting
sns.despine()
ax.get_legend().set_frame_on(False)

plt.show()

histogram_grouped

As you can notice, this being a histogram and not a bar plot, there is no space between the bars except where values of the x axis are not present in the dataset, like for values 12 and 14.

Seeing as the accepted answer provided a bar plot in pandas and that a bar plot may be a relevant choice for displaying a histogram in certain situations, here is how to create one with seaborn using the countplot function:

# For some reason the palette argument in countplot is not processed the
# same way as in histplot so here I fetch the colors from the previous
# example to make it easier to compare them
colors = [c for c in set([patch.get_facecolor() for patch in ax.patches])]

# Create bar chart of counts of each value grouped by type
fig, ax = plt.subplots(figsize=(10,4))
sns.countplot(data=df, x='value', hue='val_type', palette=colors,
              saturation=1, edgecolor='white')

# Additional formatting
sns.despine()
ax.get_legend().set_frame_on(False)

plt.show()

countplot_grouped

As this is a bar plot, the values 12 and 14 are not included which produces a somewhat deceitful plot as no empty space is shown for those values. On the other hand, there is some space between each group of bars which makes it easier to see what value each bar belongs to.