0
votes

My aim is to show a bar chart with 3-dim data, x, categorical and y1, y2 as continuous series; the bars should have heights from y1 and color to indicate y2.

This does not seem to be particularly obscure to me, but I didn't find a simple / built-in way to use a bar chart to visualise three dimensions -- I'm thinking mostly for exploratory purposes, before investigating relationships more formally.

Am I missing a type of plot in the libraries? Is there a good alternative to showing 3d data?

Anyway here are some things that I've tried that aren't particularly satisfying:

enter image description here

Some data for these attempts

import pandas as pd                                                             
import numpy as np                                                              
import matplotlib.pyplot as plt                                                 
# Example data with explicit (-ve) correlation in the two series                
n = 10; sd = 2.5                                                                
fruits = [ 'Lemon', 'Cantaloupe', 'Redcurrant', 'Raspberry', 'Papaya',          
          'Apricot', 'Cherry', 'Durian', 'Guava', 'Jujube']                     
np.random.seed(101)                                                             
cost    = np.random.uniform(3, 15, n)                                           
harvest = 50 - (np.random.randn(n) * sd  + cost)                                
df = pd.DataFrame(data={'fruit':fruits, 'cost':cost, 'harvest':harvest})                                                                                
df.sort_values(by="cost", inplace=True) # preferrable to sort during plot only  
# set up several subplots to show progress.                                     
n_colors = 5; cmap_base = "coolwarm" # a diverging map                          
fig, axs = plt.subplots(3,2)                                             
ax = axs.flat    

Attempt 1 uses hue for the 3rd dim data in barplot. However, this produces a single color for each value in the series, and also seems to do odd things with the bar width & spacing.

import seaborn as sns                                                           
sns.barplot(ax=ax[0], x='fruit', y='cost', hue='harvest', 
    data=df, palette=cmap_base)
# fix the sns barplot label orientation                                         
ax[0].set_xticklabels(ax[0].get_xticklabels(), rotation=90)                     

Attempt 2 uses the pandas DataFrame.plot.bar, with a continuous color range, then adds a colorbar (need scalar mappable). I borrowed some techniques from medium post among others.

import matplotlib as mpl                                                        
norm = mpl.colors.Normalize(vmin=min(df.harvest), vmax=max(df.harvest), clip=True)
mapper1 = mpl.cm.ScalarMappable(norm=norm, cmap=cmap_base)                      
colors1 = [mapper1.to_rgba(x) for x in df.harvest]                              
df.plot.bar(ax=ax[1], x='fruit', y='cost', color=colors1, legend=False)         
mapper1._A = []                                                                 
plt.colorbar(mapper1, ax=ax[1], label='havest')                                 

Attempt 3 builds on this, borrowing from https://gist.github.com/jakevdp/91077b0cae40f8f8244a to facilitate a discrete colormap.

def discrete_cmap(N, base_cmap=None):                                           
    """Create an N-bin discrete colormap from the specified input map"""        
    # from https://gist.github.com/jakevdp/91077b0cae40f8f8244a                 
    base = plt.cm.get_cmap(base_cmap)                                           
    color_list = base(np.linspace(0, 1, N))                                     
    cmap_name = base.name + str(N)                                              
    return base.from_list(cmap_name, color_list, N)                             

cmap_disc = discrete_cmap(n_colors, cmap_base)                                  
mapper2 = mpl.cm.ScalarMappable(norm=norm, cmap=cmap_disc)                      
colors2 = [mapper2.to_rgba(x) for x in df.harvest]                              
df.plot.bar(ax=ax[2], x='fruit', y='cost', color=colors2, legend=False)         
mapper2._A = []                                                                 
cb = plt.colorbar(mapper2, ax=ax[2], label='havest')                            
cb.set_ticks(np.linspace(*cb.get_clim(), num=n_colors+1))       # indicate color boundaries
cb.set_ticklabels(["{:.0f}".format(t) for t in cb.get_ticks()]) # without too much precision

Finally, attempt 4 gives in to trying 3d in one plot and present in 2 parts.

sns.barplot(ax=ax[4], x='fruit', y='cost', data=df, color='C0')                 
ax[4].set_xticklabels(ax[4].get_xticklabels(), rotation=90)                                                                                                 
sns.regplot(x='harvest', y='cost', data=df, ax=ax[5])                                                                   

(1) is unusable - I'm clearly not using as intended. (2) is ok with 10 series but with more series is harder to tell whether a given sample is above/below average, for instance. (3) is quite nice and scales to 50 bars ok, but it is far from "out-of-the-box", too involved for a quick analysis. Moreover, the sm._A = [] seems like a hack but the code fails without it. Perhaps the solution in a couple of lines in (4) is a better way to go.


To come back to the question again: Is it possible easily produce a bar chart that displays 3d data? I've focused on using a small number of colors for the 3rd dimension for easier identification of trends, but I'm open to other suggestions.

I've posted a solution as well, which uses a lot of custom code to achieve what I can't really believe is not built in some graphing library of python.


edit: the following code, using R's ggplot gives a reasonable approximation to (2) with built-in commands.

ggplot(data = df, aes(x =reorder(fruit, +cost), y = cost, fill=harvest)) +
  geom_bar(data=df, aes(fill=harvest), stat='identity') +
  scale_fill_gradientn(colours=rev(brewer.pal(7,"RdBu")))

The first 2 lines are more or less the minimal code for barplot, and the third changes the color palette.

So if this ease were available in python I'd love to know about it!

1
I'm finding this question hard to understand. For once, (2) and (3) seem valid solutions to the problem of plotting a bar graph with a third dimension encoded in the bars' colors. So the question should go more in detail about what the desired outcome is. Somehow the answer given defines the desired outcome, yet, it is formulated in a way that leaves room for it not being the definitive answer. If there is anything you want to know, it would make sense to directly ask for it.ImportanceOfBeingErnest
Thanks for taking the time to give me feedback. I initially wanted to have a quick way to visualise the 2nd continuous dimension -- a one-liner if possible. Since it wasn't forthcoming I got a bit into the weeds of the specific dataset (non-uniform, with outliers as well, making the color ranging more tricky). I would be content to achieve (2) without having to manually fiddle with normalisation etc. But the increased effort of invoking normalizers and scalars opens the temptation to try and then pick a "good" color mapping.Bonlenfum

1 Answers

1
votes

I'm posting an answer that does solve my aims of being simple at the point of use, still being useful with ~100 bars, and by leveraging the Fisher-Jenks 1d classifier from PySAL ends up handling outliers quite well (post about d3 coloring) -- but overall is quite involved (50+ lines in the BinnedColorScaler class, posted at the bottom).

# set up the color binner
quantizer = BinnedColorScaler(df.harvest, k=5, cmap='coolwarm' )
# and plot dataframe with it.
df.plot.bar(ax=ax, x='fruit', y='cost', 
            color=df.harvest.map(quantizer.map_by_class))
quantizer.add_legend(ax, title='harvest') # show meaning of bins in legend

Using the following class that uses a nice 1d classifier from PySAL and borrows ideas from geoplot/geopandas libraries.

enter image description here

from pysal.esda.mapclassify import Fisher_Jenks
class BinnedColorScaler(object):
    '''
    give this an array-like data set, a bin count, and a colormap name, and it
    - quantizes the data
    - provides a bin lookup and a color mapper that can be used by pandas for selecting artist colors
    - provides a method for a legend to display the colors and bin ranges

    '''
    def __init__(self, values, k=5, cmap='coolwarm'):
        self.base_cmap = plt.cm.get_cmap(cmap) # can be None, text, or a cmap instane
        self.bin_colors = self.base_cmap(np.linspace(0, 1, k)) # evenly-spaced colors

        # produce bins - see _discrete_colorize in geoplot.geoplot.py:2372
        self.binning = Fisher_Jenks(np.array(values), k)
        self.bin_edges = np.array([self.binning.yb.min()] + self.binning.bins.tolist())
        # some text for the legend (as per geopandas approx)
        self.categories = [
            '{0:.2f} - {1:.2f}'.format(self.bin_edges[i], self.bin_edges[i + 1])
            for i in xrange(len(self.bin_edges) - 1)]

    def map_by_class(self, val):
        ''' return a color for a given data value '''
        #bin_id = self.binning.find_bin(val)
        bin_id = self.find_bin(val)
        return self.bin_colors[bin_id]

    def find_bin(self, x):
        ''' unfortunately the pysal implementation seems to fail on bin edge
        cases :(. So reimplement with the way we expect here.
        '''
        # wow, subtle. just <= instead of < in the uptos
        x = np.asarray(x).flatten()
        uptos = [np.where(value <= self.binning.bins)[0] for value in x]
        bins = [v.min() if v.size > 0 else len(self.bins)-1 for v in uptos] #bail upwards
        bins = np.asarray(bins)
        if len(bins) == 1:
            return bins[0]
        else:
            return bins

    def add_legend(self, ax, title=None, **kwargs):
        ''' add legend showing the discrete colors and the corresponding data range '''
        # following the geoplot._paint_hue_legend functionality, approx.
        # generate a patch for each color in the set
        artists, labels = [], []
        for i in xrange(len(self.bin_colors)):
            labels.append(self.categories[i])
            artists.append(mpl.lines.Line2D(
                (0,0), (1,0), mfc='none', marker='None', ls='-', lw=10,
                color=self.bin_colors[i]))

        return ax.legend(artists, labels, fancybox=True, title=title, **kwargs)