Create a stacked bar chart of the N largest columns per row in a dataframe

Question

I have a dataframe of N columns of values by M dates.

I'm looking to plot a stacked bar chart of the 3 largest values per date.

Test dataframe:

import pandas
import numpy

data = {
    'A': [ 65, 54, 12, 14, 30, numpy.nan ],
    'B': [ 54, 47, 60, 34, 40, 35 ],
    'C': [ 34, 39, 57, 56, 48, numpy.nan ],
    'D': [ 20, 18, 47, 47, 35, 70 ]
}

df = pandas.DataFrame(index=pandas.date_range('2018-01-01', '2018-01-06').date,
                      data=data,
                      dtype=numpy.float64)

               A     B     C     D
2018-01-01  65.0  54.0  34.0  20.0
2018-01-02  54.0  47.0  39.0  18.0
2018-01-03  12.0  60.0  57.0  47.0
2018-01-04  14.0  34.0  56.0  47.0
2018-01-05  30.0  40.0  48.0  35.0
2018-01-06   NaN  35.0   NaN  70.0

Extracting the 3 largest values per row:

I have found nlargest which I can use to extract the 3 largest columns and their respective values for each row:

for date,row in df.iterrows():
    top = row.nlargest(3)
    s = [f'{c}={v}' for c,v in top.iteritems()]
    print('{}: [ {} ]'.format(date, ', '.join(s)))

2018-01-01: [ A=65.0, B=54.0, C=34.0 ]
2018-01-02: [ A=54.0, B=47.0, C=39.0 ]
2018-01-03: [ B=60.0, C=57.0, D=47.0 ]
2018-01-04: [ C=56.0, D=47.0, B=34.0 ]
2018-01-05: [ C=48.0, B=40.0, D=35.0 ]
2018-01-06: [ D=70.0, B=35.0 ]

Plotting the data in a stacked bar chart:

The final step, to take the above data and plot a stacked bar chart so that it looks like the example below, I have been unsuccessful in.

I'm not even sure if nlargest is the best approach.

Desired output:

Question:

How can I create a stacked bar chart of the N largest columns per row in a dataframe?

this seems to provide what you are looking for: matplotlib.org/gallery/lines_bars_and_markers/bar_stacked.html — Tacratis

Jon Clements Jon Clements · Accepted Answer · 2019-03-26T17:29:40

Starting with your input df:

top3_by_date = (
    # bring the date back as a column to use as a grouping var
    df.reset_index()
    # make a long DF of date/column/name value
    .melt(id_vars='index')
    # order DF by highest values first
    .sort_values('value', ascending=False)
    # group by the index and take the first 3 rows of each
    .groupby('index')
    .head(3)
    # pivot back so we've got an X & Y to chart...
    .pivot('index', 'variable')
    # drop the value level as we don't need that
    .droplevel(level=0, axis=1)
)

This gives:

variable       A     B     C     D
index                             
2018-01-01  65.0  54.0  34.0   NaN
2018-01-02  54.0  47.0  39.0   NaN
2018-01-03   NaN  60.0  57.0  47.0
2018-01-04   NaN  34.0  56.0  47.0
2018-01-05   NaN  40.0  48.0  35.0
2018-01-06   NaN  35.0   NaN  70.0

Then you can do top3_by_date.plot.bar(stacked=True), which should give you something similar to:

Create a stacked bar chart of the N largest columns per row in a dataframe

3 Answers