How to get nicely formatted dates like the pandas line plot
The issue is that the pandas bar plot processes the date variable as a categorical variable where each date is considered to be a unique category, so the x-axis units are set to integers starting at 0 (like the default DataFrame index when none is assigned) and the full string of each date is shown without any automatic formatting.
Here are two solutions to format the date tick labels of a pandas (stacked) bar chart of a time series:
- The first is a variation of the answer by unutbu and is made to better fit the data shown in the question;
- The second is a generalized solution that lets you use matplotlib date tick locators and formatters which produces appropriate date labels for time series of any type of frequency.
But first, let's see what the nicely formatted tick labels look like when the sample data is plotted with a pandas line plot.
Default pandas line plot date formatting
import numpy as np # v 1.19.2
import pandas as pd # v 1.1.3
import matplotlib.dates as mdates # v 3.3.2
# Create sample dataset with a daily frequency and resample it to a weekly frequency
rng = np.random.default_rng(seed=123) # random number generator
idx = pd.date_range(start='2012-01-01', end='2013-12-31', freq='D')
df_raw = pd.DataFrame(rng.random(size=(idx.size, 3)),
index=idx, columns=list('ABC'))
df = df_raw.resample('W').sum() # default is 'W-SUN'
# Create pandas stacked line plot
ax = df.plot(stacked=True, figsize=(10,5))
Because the data is grouped by week with timestamps for Sundays (frequency W-SUN
), the monthly tick labels are not necessarily placed on the first day of the month and there can be 3 or 4 weeks between each first week of the month so the minor ticks are unevenly spaced (noticeable if you look closely). Here are the exact dates of the major ticks:
# Convert major x ticks to date labels
np.array([mdates.num2date(tick*7-4).strftime('%Y-%b-%d') for tick in ax.get_xticks()])
"""
array(['2012-Jan-01', '2012-Apr-01', '2012-Jul-01', '2012-Oct-07',
'2013-Jan-06', '2013-Apr-07', '2013-Jul-07', '2013-Oct-06',
'2014-Jan-05'], dtype='<U11')
"""
The challenge lies in selecting the ticks for each first week of the month seeing as they are unequally spaced. Other answers have provided simple solutions based on a fixed tick frequency which produces oddly spaced labels in terms of dates where the months can be sometimes repeated (for example the month of July in unutbu's answer). Or they have provided solutions based on a monthly time series instead of a weekly time series, which is simpler to format seeing as there are always 12 months per year. So here is a solution that gives nicely formatted tick labels like in the pandas line plot and that works for any frequency of data.
Solution 1: pandas bar plot with tick labels based on the DatetimeIndex
# Create pandas stacked bar chart
ax = df.plot.bar(stacked=True, figsize=(10,5))
# Create list of monthly timestamps by selecting the first weekly timestamp of each
# month (in this example, the first Sunday of each month)
monthly_timestamps = [timestamp for idx, timestamp in enumerate(df.index)
if (timestamp.month != df.index[idx-1].month) | (idx == 0)]
# Automatically select appropriate number of timestamps so that x-axis does
# not get overcrowded with tick labels
step = 1
while len(monthly_timestamps[::step]) > 10: # increase number if time range >3 years
step += 1
timestamps = monthly_timestamps[::step]
# Create tick labels from timestamps
labels = [ts.strftime('%b\n%Y') if ts.year != timestamps[idx-1].year
else ts.strftime('%b') for idx, ts in enumerate(timestamps)]
# Set major ticks and labels
ax.set_xticks([df.index.get_loc(ts) for ts in timestamps])
ax.set_xticklabels(labels)
# Set minor ticks without labels
ax.set_xticks([df.index.get_loc(ts) for ts in monthly_timestamps], minor=True)
# Rotate and center labels
ax.figure.autofmt_xdate(rotation=0, ha='center')
To my knowledge, there is no way of getting this exact label formatting with the matplotlib.dates
(mdates) tick locators and formatters. Nevertheless, combining mdates functionalities with a pandas stacked bar plot can come in handy if you prefer using tick locators/formatters or if you want to have dynamic ticks when using the interactive interface of matplotlib (to pan/zoom in and out).
At this point, it may be useful to consider creating the stacked bar plot in matplotlib directly, where you need to loop through the variables to create the stacked bar. The pandas-based solution shown below works by looping through the patches of the bars to relocate them according to matplotlib date units. So it is basically one loop instead of another, up to you to see which is more convenient.
Solution 2: pandas bar plot with matplotlib tick locators and formatters
This generalized solution uses the mdates AutoDateLocator
which places ticks at the beginning of months/years. If you generate data and timestamps with pd.date_range
in pandas (like in this example), you should keep in mind that the commonly used 'M'
and 'Y'
frequencies produce timestamps for the end date of the periods. The code given in the following example aligns monthly/yearly tick marks with 'MS'
and 'YS'
frequencies.
If you import a dataset using end-of-period dates (or some other type of pandas frequency not aligned with AutoDateLocator
ticks), I am not aware of any convenient way to shift the AutoDateLocator accordingly so that the labels become correctly aligned with the bars. I see two options: i) resample the data using df.resample('MS').sum()
if that does not cause any issue regarding the meaning of the underlying data; ii) or else use another date locator.
This issue causes no problem in the following example seeing as the data has a week end frequency 'W-SUN'
so the monthly/yearly labels placed at a month/year start frequency are fine.
# Create pandas stacked bar chart with the default bar width = 0.5
ax = df.plot.bar(stacked=True, figsize=(10,5))
# Compute width of bars in matplotlib date units, 'md' (in days) and adjust it if
# the bar width in df.plot.bar has been set to something else than the default 0.5
bar_width_md_default, = np.diff(mdates.date2num(df.index[:2]))/2
bar_width = ax.patches[0].get_width()
bar_width_md = bar_width*bar_width_md_default/0.5
# Compute new x values in matplotlib date units for the patches (rectangles) that
# make up the stacked bars, adjusting the positions according to the bar width:
# if the frequency is in months (or years), the bars may not always be perfectly
# centered over the tick marks depending on the number of days difference between
# the months (or years) given by df.index[0] and [1] used to compute the bar
# width, this should not be noticeable if the bars are wide enough.
x_bars_md = mdates.date2num(df.index) - bar_width_md/2
nvar = len(ax.get_legend_handles_labels()[1])
x_patches_md = np.ravel(nvar*[x_bars_md])
# Set bars to new x positions and adjust width: this loop works fine with NaN
# values as well because in bar plot NaNs are drawn with a rectangle of 0 height
# located at the foot of the bar, you can verify this with patch.get_bbox()
for patch, x_md in zip(ax.patches, x_patches_md):
patch.set_x(x_md)
patch.set_width(bar_width_md)
# Set major ticks
maj_loc = mdates.AutoDateLocator()
ax.xaxis.set_major_locator(maj_loc)
# Show minor tick under each bar (instead of each month) to highlight
# discrepancy between major tick locator and bar positions seeing as no tick
# locator is available for first-week-of-the-month frequency
ax.set_xticks(x_bars_md + bar_width_md/2, minor=True)
# Set major tick formatter
zfmts = ['', '%b\n%Y', '%b', '%b-%d', '%H:%M', '%H:%M']
fmt = mdates.ConciseDateFormatter(maj_loc, zero_formats=zfmts, show_offset=False)
ax.xaxis.set_major_formatter(fmt)
# Shift the plot frame to where the bars are now located
xmin = min(x_bars_md) - bar_width_md
xmax = max(x_bars_md) + 2*bar_width_md
ax.set_xlim(xmin, xmax)
# Adjust tick label format last, else it may sometimes not be applied correctly
ax.figure.autofmt_xdate(rotation=0, ha='center')
Minor ticks a displayed under each bar to highlight the fact that the timestamps of the bars often do not coincide with a month/year start marked by the labels of the AutoDateLocator
ticks. I am not aware of any date locator that can be used to select ticks for the first week of each month and reproduce exactly the result shown in solution 1.
Documentation: date format codes, mdates.ConciseDateFormatter