My aim is to show a bar chart with 3-dim data, x, categorical and y1, y2 as continuous series; the bars should have heights from y1 and color to indicate y2.
This does not seem to be particularly obscure to me, but I didn't find a simple / built-in way to use a bar chart to visualise three dimensions -- I'm thinking mostly for exploratory purposes, before investigating relationships more formally.
Am I missing a type of plot in the libraries? Is there a good alternative to showing 3d data?
Anyway here are some things that I've tried that aren't particularly satisfying:
Some data for these attempts
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
# Example data with explicit (-ve) correlation in the two series
n = 10; sd = 2.5
fruits = [ 'Lemon', 'Cantaloupe', 'Redcurrant', 'Raspberry', 'Papaya',
'Apricot', 'Cherry', 'Durian', 'Guava', 'Jujube']
np.random.seed(101)
cost = np.random.uniform(3, 15, n)
harvest = 50 - (np.random.randn(n) * sd + cost)
df = pd.DataFrame(data={'fruit':fruits, 'cost':cost, 'harvest':harvest})
df.sort_values(by="cost", inplace=True) # preferrable to sort during plot only
# set up several subplots to show progress.
n_colors = 5; cmap_base = "coolwarm" # a diverging map
fig, axs = plt.subplots(3,2)
ax = axs.flat
Attempt 1 uses hue
for the 3rd dim data in barplot
. However, this produces a single color for each value in the series, and also seems to do odd things with the bar width & spacing.
import seaborn as sns
sns.barplot(ax=ax[0], x='fruit', y='cost', hue='harvest',
data=df, palette=cmap_base)
# fix the sns barplot label orientation
ax[0].set_xticklabels(ax[0].get_xticklabels(), rotation=90)
Attempt 2 uses the pandas DataFrame.plot.bar
, with a continuous color range, then adds a colorbar (need scalar mappable). I borrowed some techniques from medium post among others.
import matplotlib as mpl
norm = mpl.colors.Normalize(vmin=min(df.harvest), vmax=max(df.harvest), clip=True)
mapper1 = mpl.cm.ScalarMappable(norm=norm, cmap=cmap_base)
colors1 = [mapper1.to_rgba(x) for x in df.harvest]
df.plot.bar(ax=ax[1], x='fruit', y='cost', color=colors1, legend=False)
mapper1._A = []
plt.colorbar(mapper1, ax=ax[1], label='havest')
Attempt 3 builds on this, borrowing from https://gist.github.com/jakevdp/91077b0cae40f8f8244a to facilitate a discrete colormap.
def discrete_cmap(N, base_cmap=None):
"""Create an N-bin discrete colormap from the specified input map"""
# from https://gist.github.com/jakevdp/91077b0cae40f8f8244a
base = plt.cm.get_cmap(base_cmap)
color_list = base(np.linspace(0, 1, N))
cmap_name = base.name + str(N)
return base.from_list(cmap_name, color_list, N)
cmap_disc = discrete_cmap(n_colors, cmap_base)
mapper2 = mpl.cm.ScalarMappable(norm=norm, cmap=cmap_disc)
colors2 = [mapper2.to_rgba(x) for x in df.harvest]
df.plot.bar(ax=ax[2], x='fruit', y='cost', color=colors2, legend=False)
mapper2._A = []
cb = plt.colorbar(mapper2, ax=ax[2], label='havest')
cb.set_ticks(np.linspace(*cb.get_clim(), num=n_colors+1)) # indicate color boundaries
cb.set_ticklabels(["{:.0f}".format(t) for t in cb.get_ticks()]) # without too much precision
Finally, attempt 4 gives in to trying 3d in one plot and present in 2 parts.
sns.barplot(ax=ax[4], x='fruit', y='cost', data=df, color='C0')
ax[4].set_xticklabels(ax[4].get_xticklabels(), rotation=90)
sns.regplot(x='harvest', y='cost', data=df, ax=ax[5])
(1) is unusable - I'm clearly not using as intended. (2) is ok with 10 series but with more series is harder to tell whether a given sample is above/below average, for instance. (3) is quite nice and scales to 50 bars ok, but it is far from "out-of-the-box", too involved for a quick analysis. Moreover, the sm._A = []
seems like a hack but the code fails without it. Perhaps the solution in a couple of lines in (4) is a better way to go.
To come back to the question again: Is it possible easily produce a bar chart that displays 3d data? I've focused on using a small number of colors for the 3rd dimension for easier identification of trends, but I'm open to other suggestions.
I've posted a solution as well, which uses a lot of custom code to achieve what I can't really believe is not built in some graphing library of python.
edit:
the following code, using R's ggplot
gives a reasonable approximation to (2) with built-in commands.
ggplot(data = df, aes(x =reorder(fruit, +cost), y = cost, fill=harvest)) +
geom_bar(data=df, aes(fill=harvest), stat='identity') +
scale_fill_gradientn(colours=rev(brewer.pal(7,"RdBu")))
The first 2 lines are more or less the minimal code for barplot, and the third changes the color palette.
So if this ease were available in python I'd love to know about it!