I am not aware of any plotting package that lets you create this plot in a straightforward way based on how your sample table is structured. One option could be to compute a start
and an end
variable and then create the plot like in the answers to this question, for example using the Altair Gantt chart like in this answer.
Here, I offer two solutions using matplotlib. By taking a look at the matplotlib gallery, I stumbled on the broken_barh
plotting function which provides a way to create a plot like the one you want. There are two main hurdles to overcome when using it:
- Deciding what unit to use for the x-axis and computing the
xranges
argument accordingly;
- Creating and formatting the x ticks and tick labels.
Let me first create a sample dataset that resembles yours, note that you will need to adjust the color_dict
to your codes:
import numpy as np # v 1.19.2
import pandas as pd # v 1.1.3
import matplotlib.pyplot as plt # v 3.3.2
import matplotlib.dates as mdates
## Create sample dataset
# Light color codes
gre = 1
yel_to_red = 2
red = 3
yel_to_gre = 4
color_dict = {1: 'green', 2: 'yellow', 3: 'red', 4: 'yellow'}
# Light color duration in seconds
sec_g = 45
sec_yr = 3
sec_r = 90
sec_yg = 1
# Light cycle
light_cycle = [gre, yel_to_red, red, yel_to_gre]
sec_cycle = [sec_g, sec_yr, sec_r, sec_yg]
ncycles = 3
sec_total = ncycles*sum(sec_cycle)
# Create variables and store them in a pandas dataframe with the datetime as index
IntersectionId = 12345
currState = np.repeat(ncycles*light_cycle, repeats=ncycles*sec_cycle)
time_sec = pd.date_range(start='2021-01-04 08:00:00', freq='S', periods=sec_total)
df = pd.DataFrame(dict(IntersectionId = np.repeat(12345, repeats=ncycles*sum(sec_cycle)),
currState = currState),
index = time_sec)
The broken_barh
function takes the data in the format of tuples where for each colored rectangle that makes up the horizontal bar you need to provide the xy coordinates of the bottom-left corner as well as the length along each axis, like so:
xranges=[(x1_start, x1_length), (x2_start, x2_length), ... ], yranges=(y_all_start, y_all_width)
Note that yranges
applies to all rectangles. The unit that is chosen for the x-axis determines how the data must be processed and how the x ticks and tick labels can be created. Here are two alternatives.
Matplotlib broken_barh
with matplotlib date number as x-axis scale
In this approach, the timestamps of the rows where the light changes are extracted and then converted to matplotlib date numbers. This makes it possible to use a matplotlib date tick locator and formatter. This approach of using the matplotlib date for the x-axis values to simplify tick formatting was inspired by this answer by ImportanceOfBeingErnest.
For both this solution and the next one, the code for getting the indices of light changes and computing the lengths of the periods is based on this answer by Jaime, thanks to the general idea provided by this Gist by alimanfoo.
## Compute variables needed to define the plotting function arguments
states = np.array(df['currState'])
# Create a list of indices of the rows where the light changes
# (i.e. where a new currState code section starts)
starts_indices = np.where(np.concatenate(([True], states[:-1] != states[1:])))
# Append the last index to be able to compute the duration of the last
# light color period recorded in the dataset
starts_end_indices = np.append(starts_indices, states.size-1)
# Get the timestamps of those rows and convert them to python datetime format
starts_end_pydt = df.index[starts_end_indices].to_pydatetime()
# Convert the python timestamps to matplotlib date number that is used as the
# x-axis unit, this makes it easier to format the tick labels
starts_end_x = mdates.date2num(starts_end_pydt)
# Get the duration of each light color in matplotlib date number units
lengths = np.diff(starts_end_x)
# Add one second (computed in python datetime units) to the duration of
# the last light to make the bar chart left and right inclusive instead
# of just left inclusive
pydt_second = (max(starts_end_x) - min(starts_end_x))/starts_end_indices[-1]
lengths[-1] = lengths[-1] + pydt_second
# Compute the arguments for the broken_barh plotting function
xranges = [(start, length) for start, length in zip(starts_end_x, lengths)]
yranges = (0.75, 0.5)
colors = df['currState'][starts_end_indices[:-1]].map(color_dict)
## Create horizontal bar with colors by using the broken_barh function
## and format ticks and tick labels
fig, ax = plt.subplots(figsize=(10,2))
ax.broken_barh(xranges, yranges, facecolors=colors, zorder=2)
# Create and format x ticks and tick labels
loc = mdates.AutoDateLocator()
ax.xaxis.set_major_locator(loc)
formatter = mdates.AutoDateFormatter(loc)
formatter.scaled[1/(24.*60.)] = '%H:%M:%S' # adjust this according to time range
ax.xaxis.set_major_formatter(formatter)
# Format y-axis and create y tick and tick label
ax.set_ylim(0, 2)
ax.set_yticks([1])
ax.set_yticklabels([df['IntersectionId'][0]])
plt.grid(axis='x', alpha=0.5, zorder=1)
plt.show()
Matplotlib broken_barh
with seconds as x-axis scale
This approach takes advantage of the fact that the indices of the table can be used to compute the lights' durations in seconds. The downside is that this time the x ticks and tick labels must be created from scratch. The code is written so that labels automatically have a nice format depending on the total duration covered by the dataset. The only thing that needs adjusting is the number of ticks, as this depends on how wide the figure is.
The code used to automatically select an appropriate time step between ticks is based on this answer by kennytm. The datetime string format codes are listed here.
## Compute the variables needed for the plotting function arguments
## using the currState variable
states = np.array(df['currState'])
# Create list of indices indicating the rows where the currState code
# changes: note the comma to unpack the tuple
starts_indices, = np.where(np.concatenate(([True], states[:-1] != states[1:])))
# Compute durations of each light in seconds
lengths = np.diff(starts_indices, append=states.size)
## Compute the arguments for the plotting function
xranges = [(start, length) for start, length in zip(starts_indices, lengths)]
yranges = (0.75, 0.5)
colors = df['currState'][starts_indices].map(color_dict)
## Create horizontal bar with colors using the broken_barh function
fig, ax = plt.subplots(figsize=(10,2))
ax.broken_barh(xranges, yranges, facecolors=colors, zorder=2)
## Create appropriate x ticks and tick labels
# Define time variable and parameters needed for computations
time = pd.DatetimeIndex(df.index).asi8 // 10**9 # time is in seconds
tmin = min(time)
tmax = max(time)
trange = tmax-tmin
# Choose the approximate number of ticks, the exact number depends on
# the automatically selected time step
approx_nticks = 6 # low number selected because figure width is only 10 inches
round_time_steps = [15, 30, 60, 120, 180, 240, 300, 600, 900, 1800, 3600, 7200, 14400]
time_step = min(round_time_steps, key=lambda x: abs(x - trange//approx_nticks))
# Create list of x ticks including the right boundary of the last time point
# in the dataset regardless of whether not it is aligned with the time step
timestamps = np.append(np.arange(tmin, tmax, time_step), tmax+1)
xticks = timestamps-tmin
ax.set_xticks(xticks)
# Create x tick labels with format depending on time step
fmt_time = '%H:%M:%S' if time_step <= 60 else '%H:%M'
xticklabels = [pd.to_datetime(ts, unit='s').strftime(fmt_time) for ts in timestamps]
ax.set_xticklabels(xticklabels)
## Format y-axis limits, tick and tick label
ax.set_ylim(0, 2)
ax.set_yticks([1])
ax.set_yticklabels([df['IntersectionId'][0]])
plt.grid(axis='x', alpha=0.5, zorder=1)
plt.show()
Further documentation: to_datetime, to_pydatetime, strftime