0
votes

I am trying to replicate a chart like the following using a pandas dataframe and bokeh vbar.:

Objective

So far, I´ve managed to place the labels in their corresponding height but now I can't find a way to access the numeric value where the category (2016,2017,2018) is located in the x axis. This is my result:

My nested categorical stacked bars chart

This is my code. It's messy but it's what i've managed so far. So is there a way to access the numeric value in x_axis of the bars?

def make_nested_stacked_bars(source,measurement,dimension_attr):
    #dimension_attr is a list that contains the names of columns in source that will be used as categories
    #measurement containes the name of the column with numeric data.

    data = source.copy()
    #Creates list of values of highest index
    list_attr = source[dimension_attr[0]].unique()
    list_stackers = list(source[dimension_attr[-1]].unique())
    list_stackers.sort()

    #trims labals that are too wide to fit in graph
    for column in data.columns:
        if data[column].dtype.name == 'object':
            data[column] = np.where(data[column].apply(len) > 30, data[column].str[:30]+'...', data[column])

    #Creates a list of dataframes, each grouping a specific value
    list_groups = []
    for item in list_attr:
        list_groups.append(data[data[dimension_attr[0]] == item])
    #Groups data by dimension attrs, aggregates measurement to count

    #Drops highest index from dimension attr
    dropped_attr = dimension_attr[0]
    dimension_attr.remove(dropped_attr)

    #Creates groupby by the last 2 parameters, and aggregates to count
    #Calculates percentage
    for index,value in enumerate(list_groups):
        list_groups[index] = list_groups[index].groupby(by=dimension_attr).agg({measurement: ['count']})
        list_groups[index] = list_groups[index].groupby(level=0).apply(lambda x: round(100 * x / float(x.sum()),1))
        # Resets indexes
        list_groups[index] =  list_groups[index].reset_index()
        list_groups[index] = list_groups[index].pivot(index=dimension_attr[0], columns=dimension_attr[1])
        list_groups[index].index = [(x,list_attr[index]) for x in list_groups[index].index]
        # Drops dimension attr as top level column
        list_groups[index].columns =   list_groups[index].columns.droplevel(0)
        list_groups[index].columns =   list_groups[index].columns.droplevel(0)

    df = pd.concat(list_groups)

    # Get the number of colors needed for the plot.
    colors = brewer["Spectral"][len(list_stackers)]
    colors.reverse()

    p = figure(plot_width=800, plot_height=500, x_range=FactorRange(*df.index))

    renderers = p.vbar_stack(list_stackers, x='index', width=0.3, fill_color=colors, legend=[get_item_value(x)for x in list_stackers], line_color=None, source=df, name=list_stackers,)

    # Adds a different hovertool to a stacked bar

    #empy dictionary with initial values set to zero
    list_previous_y = {}
    for item in df.index:
        list_previous_y[item] = 0

    #loops through bar graphs 
    for r in renderers:
        stack = r.name
        hover = HoverTool(tooltips=[
            ("%s" % stack, "@%s" % stack),
        ], renderers=[r])

        #Initial value for placing label in x_axis
        previous_x = 0.5

        #Loops through dataset rows
        for index, row in df.iterrows():
            #adds value of df column to list 
            list_previous_y[index] = list_previous_y[index] + df[stack][index]
            ## adds label if value is not nan and at least 10
            if not math.isnan(df[stack][index]) and df[stack][index]>=10:
                p.add_layout(Label(x=previous_x, y=list_previous_y[index] -df[stack][index]/2, 
                                   text='% '+str(df[stack][index]), render_mode='css',
                                   border_line_color='black', border_line_alpha=1.0,
                                    background_fill_color='white', background_fill_alpha=1.0))
            # increases position in x_axis
            #this should be done by adding the value of next bar in x_axis
            previous_x = previous_x + 0.8

        p.add_tools(hover)


    p.add_tools(hover)
    p.legend.location = "top_left"
    p.x_range.range_padding = 0.2
    p.xgrid.grid_line_color = None

    return p

Or is there an easier way to get all this done?

Thank you for your time!

UPDATE:

Added an additional image of a three level nested chart where the label placement in x_axis should be accomplished too

Three level nested chart

2

2 Answers

0
votes

I can't find a way to access the numeric value where the category (2016,2017,2018) is located in the x axis.

There is not any way to access this information on the Python side in standalone Bokeh output. The coordinates are only computed inside the browser on the JavaScript side. i.e. only after your Python code has finished running and is out of the picture entirely. Even in a Bokeh server app context there is not any direct way, as there are not any synchronized properties that record the values.

As of Bokeh 1.3.4, support for placing labels with categorical coordinates is a known open issue.

In the mean time, the only workarounds I can suggest are:

  • Use the text glyph method with coordinates in a ColumnDataSource, instead of Label. That should work to position with actual categorical coordinates. (LabelSet might also work, though I have not tried). You can see an example of text with categorical coordiantes here:

    https://github.com/bokeh/bokeh/blob/master/examples/plotting/file/periodic.py

  • Use numerical coordinates to position the Label. But you will have to experiment/best guess to find numercal coordinates that work for you. A rule of thumb is that categories have a width of 1.0 in synthetic (numeric) coordinate space.

0
votes

My solution was..

Creating a copy of the dataframe used for making the chart. This dataframe (labeling_data) contains the y_axis coordinates calculated so that the label is positioned at the middle of the corresponding stacked bar. Then, added aditional columnns to be used as the actual label where the values to be displayed were concatenated with the percentage symbol.

    labeling_data = df.copy()
    #Cumulative sum of columns
    labeling_data = labeling_data.cumsum(axis=1)
    #New names for columns
    y_position = []
    for item in labeling_data.columns:
        y_position.append(item+'_offset')
    labeling_data.columns = y_position

    #Copies original columns
    for item in df:
        #Adding original columns
        labeling_data[item] = df[item]
        #Modifying offset columns to place label in the middle of the bar 
        labeling_data[item+'_offset'] =  labeling_data[item+'_offset']-labeling_data[item]/2
        #Concatenating values with percentage symbol if at least 10
        labeling_data[item+'_label'] = np.where(df[item] >=10 , '% '+df[item].astype(str), "")

Finally, by looping through the renderers of the plot, a labelset was added to each stack group using the labeling_data as Datasource . By doing this, the index of the dataframe can be used to set the x_coordinate of the label. And the corresponding columns were added for the y_coordinate and text parameters.

    info = ColumnDataSource(labeling_data)

    #loops through bar graphs
    for r in renderers:
        stack = r.name

        #Loops through dataset rows
        for index, row in df.iterrows():
            #Creates Labelset and uses index, y_offset and label columns 
            #as x, y and text parameters 
            labels = LabelSet(x='index', y=stack+'_offset', text=stack+'_label', level='overlay',
                                  x_offset=-25, y_offset=-5, source=info)
            p.add_layout(labels)

Final result:

Nested categorical stacked bar chart with labels