1
votes

I have a dataframe of N columns of values by M dates.

I'm looking to plot a stacked bar chart of the 3 largest values per date.

Test dataframe:

import pandas
import numpy

data = {
    'A': [ 65, 54, 12, 14, 30, numpy.nan ],
    'B': [ 54, 47, 60, 34, 40, 35 ],
    'C': [ 34, 39, 57, 56, 48, numpy.nan ],
    'D': [ 20, 18, 47, 47, 35, 70 ]
}

df = pandas.DataFrame(index=pandas.date_range('2018-01-01', '2018-01-06').date,
                      data=data,
                      dtype=numpy.float64)
               A     B     C     D
2018-01-01  65.0  54.0  34.0  20.0
2018-01-02  54.0  47.0  39.0  18.0
2018-01-03  12.0  60.0  57.0  47.0
2018-01-04  14.0  34.0  56.0  47.0
2018-01-05  30.0  40.0  48.0  35.0
2018-01-06   NaN  35.0   NaN  70.0

Extracting the 3 largest values per row:

I have found nlargest which I can use to extract the 3 largest columns and their respective values for each row:

for date,row in df.iterrows():
    top = row.nlargest(3)
    s = [f'{c}={v}' for c,v in top.iteritems()]
    print('{}: [ {} ]'.format(date, ', '.join(s)))
2018-01-01: [ A=65.0, B=54.0, C=34.0 ]
2018-01-02: [ A=54.0, B=47.0, C=39.0 ]
2018-01-03: [ B=60.0, C=57.0, D=47.0 ]
2018-01-04: [ C=56.0, D=47.0, B=34.0 ]
2018-01-05: [ C=48.0, B=40.0, D=35.0 ]
2018-01-06: [ D=70.0, B=35.0 ]

Plotting the data in a stacked bar chart:

The final step, to take the above data and plot a stacked bar chart so that it looks like the example below, I have been unsuccessful in.

I'm not even sure if nlargest is the best approach.

Desired output:

stacked barchart example

Question:

How can I create a stacked bar chart of the N largest columns per row in a dataframe?

3
this seems to provide what you are looking for: matplotlib.org/gallery/lines_bars_and_markers/bar_stacked.htmlTacratis

3 Answers

4
votes

Starting with your input df:

top3_by_date = (
    # bring the date back as a column to use as a grouping var
    df.reset_index()
    # make a long DF of date/column/name value
    .melt(id_vars='index')
    # order DF by highest values first
    .sort_values('value', ascending=False)
    # group by the index and take the first 3 rows of each
    .groupby('index')
    .head(3)
    # pivot back so we've got an X & Y to chart...
    .pivot('index', 'variable')
    # drop the value level as we don't need that
    .droplevel(level=0, axis=1)
)

This gives:

variable       A     B     C     D
index                             
2018-01-01  65.0  54.0  34.0   NaN
2018-01-02  54.0  47.0  39.0   NaN
2018-01-03   NaN  60.0  57.0  47.0
2018-01-04   NaN  34.0  56.0  47.0
2018-01-05   NaN  40.0  48.0  35.0
2018-01-06   NaN  35.0   NaN  70.0

Then you can do top3_by_date.plot.bar(stacked=True), which should give you something similar to:

enter image description here

0
votes

It is possible, but somewhat convoluted, since you need to use bottom to offset each bar above those at the same date with lower values. This prevents bars with higher values hiding bars with lower values.

For each column (representing one series in the bar chart), 3 arrays are required:

  • dates: the dates which have values for this column (ie: the dates for which this column is one of the 3 largest values)
  • values: the difference between this value and the next lower value
  • bottoms: the value of the next lower value

Building up the arrays:

col_dates   = collections.defaultdict(list)
col_values  = collections.defaultdict(list)
col_bottoms = collections.defaultdict(list)

for date,row in df.iterrows():
    top = row.nlargest(3)
    for i,kv in enumerate(top.iteritems()):
        col, val = kv
        next_val = top.values[i+1] if i+1 < len(top.values) else 0

        col_dates  [col].append(date)
        col_values [col].append(val - next_val)
        col_bottoms[col].append(next_val)

Plotting the bar chart:

fig = pyplot.figure(figsize=(20,10))
ax = fig.add_subplot(1,1,1)

for col,vals in col_values.items():
    dates   = col_dates[col]
    bottoms = col_bottoms[col]

    ax.bar(matplotlib.dates.date2num(dates), vals, width=.6, bottom=bottoms, label=col)
    ax.xaxis_date()

ax.legend(loc='best', fontsize='large')

pyplot.show()

The resulting plot:

enter image description here

0
votes

You can do this with a simple apply. It will not be vectorized but I think it's much clearer to read. In this case I filled NaN with -np.inf because sort doesn't work well with NaN values.

import pandas as pd
import numpy as np

data = {
    'A': [ 65, 54, 12, 14, 30, np.nan ],
    'B': [ 54, 47, 60, 34, 40, 35 ],
    'C': [ 34, 39, 57, 56, 48, np.nan ],
    'D': [ 20, 18, 47, 47, 35, 70 ]
}

df = pd.DataFrame(index=pd.date_range('2018-01-01', '2018-01-06').date,
                  data=data,
                  dtype=np.float64)

df.fillna(-np.inf, inplace=True)

def search_rows(row):
    return np.where(row.isin(sorted(row, reverse=True)[:3]), row, -np.inf)

df = df.apply(search_rows, axis=1)
df.plot.bar(stacked=True)