4
votes

I can not wrap my head around axes parameter, what it contains and how to use it for making subplots.

Would really appreciate if someone could explain what is going on in the following example

fig, axes = plt.subplots(nrows=3, ncols=4, figsize=(15, 10))
for idx, feature in enumerate(df.columns[:-1]):
  df.plot(feature, "cnt", subplots=True, kind="scatter", ax=axes[idx / 4, idx % 4])

Here is the data (UCI Bike sharing dataset): a table with 5 rows of raw data Here is the output of the code snippet (a pairwise comparison of features and the end results): a beautiful chart with subplots

To be more specific, here are the parts that I do understand (at least I think I do)

  • plt.subplots returns a tuple containing a figure and axes object(s) (link)
  • enumerate() returns a tuple containing index of a feature and its name(link)
  • df.plot uses column names to put data on subplots within fig

Here is what I do not understand

  • What does axes object contain? Again, based on documentation and this answer I do realize that axes contains "Axis, Tick, Line2D, Text, Polygon, etc." but
    • what do we address using axes[x,y] ?
    • why in this example author decided to use [idx / 4, idx % 4] as values?
2

2 Answers

2
votes

Concerning the last question about the array indexing as [idx / 4, idx % 4]:

The idea is to loop over all subplots and all dataframe columns at the same time. The problem is that the axes array is two-dimensional while the column array is one-dimensional. One therefore needs to decide over which of those to loop and map the loop index/indizes to the other dimension.

An intuitive way would be to use two loops

for i in range(axes.shape[0]):
    for j in range(axes.shape[1]):
        df.plot(df.columns[i*axes.shape[0]+j], "cnt", ... , ax=axes[i,j])

Here, i*axes.shape[0]+j maps the two dimension of the numpy array to the single dimension of the columns list.

In the example from the question, the loop is over the columns, which means we have to somehow map the one-dimensional index to two dimensions. This is what [idx / 4, idx % 4] does.. or should do. It will only work in python 2. To make it more comprehensible and version save, one should actually use [idx // 4, idx % 4]. The // makes it clear that an integer division is used. So for the first 4 idx values (0,1,2,3), idx // 4 is 0, for the next set of 4 values it's 1 and so on. idx % 4 calculates the index modulo 4. So (0,1,2,3) are mapped to (0,1,2,3), and then (4,5,6,7) are mapped to (0,1,2,3) again, etc.

An alternative solution using a single loop would be to flatten the axes array:

for idx, feature in enumerate(df.columns[:-1]):
    df.plot(feature, "cnt", ... , ax=axes.flatten()[idx])

or maybe most pythonic

for ax, feature in zip(axes.flatten(), df.columns[:-1]):
    df.plot(feature, "cnt", ... , ax=ax)
2
votes

The axes object in your code is a 2D Numpy array of matplotlib Axes objects. Since the call to subplots() asked for 3 rows and 4 columns, the array will be 3 by 4. Indexing into the array like axes[r, c] gives you the Axes object that corresponds to row r and column c, and you can pass that object as the ax keyword argument to a plotting method to make the plot show up on that axis. E.g. if you wanted to plot something in the second row and second column, you would call plot(..., ax=axes[1,1]).

The code uses [idx / 4, idx % 4] as a way of converting the indices (numbers from 0 to 11) into locations in the 3-by-4 grid. Try evaluating that expression yourself with idx set to each value from 0 to 11 in turn, and you'll see how it works out.