2
votes

how to detach height of the stacked bars from colors of the fill?

I have multiple categories which I want to present in stacked bar chart so that the height represent the value and color is conditionally defined by another variable (something like fill= in the ggplot ).

I am new to bokeh and struggling with the stack bar chart mechanics. I tried construct this type of chart, but I haven't got anything except all sorts of errors. The examples of stacked bar chart are very limited in the bokeh documentation.

My Data is stored in pandas dataframe:

data =
['A',1, 15, 1]
'A',2, 14, 2
'A',3, 60, 1
'B',1, 15, 2
'B',2, 25, 2
'B',3, 20, 1
'C',1, 15, 1
'C',2, 25, 1
'C',3, 55, 2
...
]

Columns represent Category, Regime, Value, State.

I want to plot Category on x axis, Regimes stacked on y axis where bar length represents Value and color represents State.

is this achievable in bokeh? can anybody demonstrate please

1
The stacked bar documentation has colors which do not depend on height. Have you seen the documentation? You might want to combine with factor_cmap so that you don't have to assign colors manually.syntonym
I have seen the examples, but they don't apply to my case. I want to detach the height and colors. Imagine that a column with country of top export is given per fruit per year to the example from the documentation. The stacks (without any colors) are sorted, so that the lowest represents 2015, second 2016, and so on. The hight of each bar represents number of fruits per year. Colors for example could represent country with top export per each year and fruit type. Is it possible to plot that way in bokeh?Chris
My intention is to make colors dependent on an extra variable, so that a chart presents another layer of information.Chris
Is (categeory + regime) a unique key for your data?syntonym
This turned out much harder than it looked for me. I guess the vbar_stack complicates more than it helpes here, I can also write a (complete) version only with stack if that would help with understanding.syntonym

1 Answers

1
votes

I think this problem becomes much easier if you transform your data to the following form:

from bokeh.plotting import figure
from bokeh.io import show
from bokeh.transform import stack, factor_cmap
import pandas as pd

df = pd.DataFrame({
    "Category": ["a", "b"],
    "Regime1_Value": [1, 4], 
    "Regime1_State": ["A", "B"],
    "Regime2_Value": [2, 5], 
    "Regime2_State": ["B", "B"],
    "Regime3_Value": [3, 6], 
    "Regime3_State": ["B", "A"]})

p = figure(x_range=["a", "b"])
p.vbar_stack(["Regime1_Value", "Regime2_Value", "Regime3_Value"],
        x="Category",
        fill_color=[
            factor_cmap(state, palette=["red", "green"], factors=["A", "B"]) 
            for state in ["Regime1_State","Regime2_State", "Regime3_State"]],
        line_color="black",
        width=0.9,
        source=df)

show(p)

This is a bit strange, because vbar_stack behaves unlike a "normal glyph". Normally you have three options for attributes of a renderer (assume we want to plot n dots/rectangles/shapes/things:

  • Give a single value that is used for all n glyphs
  • Give a column name that is looked up in the source (source[column_name] must produce an "array" of length n)
  • Give an array of length n of data

But vbar_stack does not create one renderer, it creates as many as there are elements in the first array you give. Lets call this number k. Then to make sense of the attributes you have again three options:

  • Give a single value that is used for all glyphs
  • Give an array of k things that are used as columns names in the source (each lookup must produce an array of length n).
  • Give an array of length n of data (so for all 1-k glyphs have the same data).

So p.vbar(x=[a,b,c]) and p.vbar_stacked(x=[a,b,c]) actually do different things (the first gives literal data, the second gives column names) which confused, and it's not clear from the documentation.

But why do we have to transform your data so strangely? Lets unroll vbar_stack and write it on our own (details left out for brevity):

plotted_regimes = []

for regime in regimes: if not plotted_regimes: bottom = 0 else: bottom = stack(*plotted_regimes) p.vbar(bottom=bottom, top=stack(*plotted_regimes, regime)) plotted_regimes.append(regime)

So for each regime we have a separate vbar that has its bottom where the sum of the other regimes ended. Now with the original data structure this is not really possible because there doesn't need to be a a value for each regime for each category. Here we are forced to set these values to 0 if we actually want.

Because the stacked values corrospond to column names we have to put these values in one dataframe. The vbar_stack call in the beginning could also be written with stack (basically because vbar_stack is a convenience wrapper around stack).

The factor_cmap is used so that we don't have to manually assign colors. We could also simply add a Regime1_Color column, but this way the mapping is done automatically (and client side).