I think this problem becomes much easier if you transform your data to the following form:
from bokeh.plotting import figure
from bokeh.io import show
from bokeh.transform import stack, factor_cmap
import pandas as pd
df = pd.DataFrame({
"Category": ["a", "b"],
"Regime1_Value": [1, 4],
"Regime1_State": ["A", "B"],
"Regime2_Value": [2, 5],
"Regime2_State": ["B", "B"],
"Regime3_Value": [3, 6],
"Regime3_State": ["B", "A"]})
p = figure(x_range=["a", "b"])
p.vbar_stack(["Regime1_Value", "Regime2_Value", "Regime3_Value"],
x="Category",
fill_color=[
factor_cmap(state, palette=["red", "green"], factors=["A", "B"])
for state in ["Regime1_State","Regime2_State", "Regime3_State"]],
line_color="black",
width=0.9,
source=df)
show(p)
This is a bit strange, because vbar_stack
behaves unlike a "normal glyph". Normally you have three options for attributes of a renderer (assume we want to plot n dots/rectangles/shapes/things:
- Give a single value that is used for all n glyphs
- Give a column name that is looked up in the source (
source[column_name]
must produce an "array" of length n)
- Give an array of length n of data
But vbar_stack
does not create one renderer, it creates as many as there are elements in the first array you give. Lets call this number k. Then to make sense of the attributes you have again three options:
- Give a single value that is used for all glyphs
- Give an array of k things that are used as columns names in the source (each lookup must produce an array of length n).
- Give an array of length n of data (so for all 1-k glyphs have the same data).
So p.vbar(x=[a,b,c])
and p.vbar_stacked(x=[a,b,c])
actually do different things (the first gives literal data, the second gives column names) which confused, and it's not clear from the documentation.
But why do we have to transform your data so strangely? Lets unroll vbar_stack
and write it on our own (details left out for brevity):
plotted_regimes = []
for regime in regimes:
if not plotted_regimes:
bottom = 0
else:
bottom = stack(*plotted_regimes)
p.vbar(bottom=bottom, top=stack(*plotted_regimes, regime))
plotted_regimes.append(regime)
So for each regime we have a separate vbar that has its bottom where the sum of the other regimes ended. Now with the original data structure this is not really possible because there doesn't need to be a a value for each regime for each category. Here we are forced to set these values to 0 if we actually want.
Because the stacked values corrospond to column names we have to put these values in one dataframe. The vbar_stack
call in the beginning could also be written with stack
(basically because vbar_stack
is a convenience wrapper around stack
).
The factor_cmap
is used so that we don't have to manually assign colors. We could also simply add a Regime1_Color
column, but this way the mapping is done automatically (and client side).
vbar_stack
complicates more than it helpes here, I can also write a (complete) version only withstack
if that would help with understanding. – syntonym