Using DataFrame.plot to make a chart with subplots — how to use ax parameter

Question

I can not wrap my head around axes parameter, what it contains and how to use it for making subplots.

Would really appreciate if someone could explain what is going on in the following example

fig, axes = plt.subplots(nrows=3, ncols=4, figsize=(15, 10))
for idx, feature in enumerate(df.columns[:-1]):
  df.plot(feature, "cnt", subplots=True, kind="scatter", ax=axes[idx / 4, idx % 4])

Here is the data (UCI Bike sharing dataset): Here is the output of the code snippet (a pairwise comparison of features and the end results):

To be more specific, here are the parts that I do understand (at least I think I do)

plt.subplots returns a tuple containing a figure and axes object(s) (link)
enumerate() returns a tuple containing index of a feature and its name(link)
df.plot uses column names to put data on subplots within fig

Here is what I do not understand

What does axes object contain? Again, based on documentation and this answer I do realize that axes contains "Axis, Tick, Line2D, Text, Polygon, etc." but
- what do we address using axes[x,y] ?
- why in this example author decided to use [idx / 4, idx % 4] as values?

ImportanceOfBeingErnest ImportanceOfBeingErnest · Accepted Answer · 2017-03-29T09:41:49

Concerning the last question about the array indexing as [idx / 4, idx % 4]:

The idea is to loop over all subplots and all dataframe columns at the same time. The problem is that the axes array is two-dimensional while the column array is one-dimensional. One therefore needs to decide over which of those to loop and map the loop index/indizes to the other dimension.

An intuitive way would be to use two loops

for i in range(axes.shape[0]):
    for j in range(axes.shape[1]):
        df.plot(df.columns[i*axes.shape[0]+j], "cnt", ... , ax=axes[i,j])

Here, i*axes.shape[0]+j maps the two dimension of the numpy array to the single dimension of the columns list.

In the example from the question, the loop is over the columns, which means we have to somehow map the one-dimensional index to two dimensions. This is what [idx / 4, idx % 4] does.. or should do. It will only work in python 2. To make it more comprehensible and version save, one should actually use [idx // 4, idx % 4]. The // makes it clear that an integer division is used. So for the first 4 idx values (0,1,2,3), idx // 4 is 0, for the next set of 4 values it's 1 and so on. idx % 4 calculates the index modulo 4. So (0,1,2,3) are mapped to (0,1,2,3), and then (4,5,6,7) are mapped to (0,1,2,3) again, etc.

An alternative solution using a single loop would be to flatten the axes array:

for idx, feature in enumerate(df.columns[:-1]):
    df.plot(feature, "cnt", ... , ax=axes.flatten()[idx])

or maybe most pythonic

for ax, feature in zip(axes.flatten(), df.columns[:-1]):
    df.plot(feature, "cnt", ... , ax=ax)

Using DataFrame.plot to make a chart with subplots — how to use ax parameter

2 Answers