2
votes

Pandas sometimes decides to plot DataFrames with timeindex in different ways.

I am plotting a pandas timeseries DataFrame using df.plot() and getting different behaviors for different columns as how the dates are shown and don't understand why. I am plotting data for the 18 June, so sometimes the x axis shows the hours 06:00, 08:00,... and sometimes the date/hour in a very confusing way: 06-18 06, 06-18 08, ... Why ?same df, different columns, same timeindex

2
Impossible to answer without data and code to reproduce this - EdChum
I understand, I think it comes from NaN present for some timeindexes in the second graph. - tcapelle
Then you need to consider what to do with those index values, drop them, fill them etc.. - EdChum
they where droped - tcapelle

2 Answers

2
votes

Let's create a minimal example. The data is equally spaced with exactly 5 hours in between (5h00, 10h00, 15h00).

import pandas as pd
import matplotlib.pyplot as plt

index = pd.to_datetime(["2019-09-11 05:00:00", 
                        "2019-09-11 10:00:30",
                        "2019-09-11 15:00:00"])

pd.DataFrame({"x" : [1,2,4], "y" : [3,4,4]}, index=index).plot()
plt.show()

It will result in this plot:

enter image description here

Now, lets add 30 seconds to one of the datetimes,

index = pd.to_datetime(["2019-09-11 05:00:00",
                        "2019-09-11 10:00:30",  # <-- added 30 seconds here
                        "2019-09-11 15:00:00"])

now the data isn't equally spaced any more, and the result is this:

enter image description here

So in the latter case pandas does not consider it as "ts_plot". "ts" presumably stands for time series, but one could argue that both are time series anyways. But of course the latter case cannot be resampled - so that seems the underlying distinction.
Unfortunately, pandas ties the formatter to this kind of time series, and it cannot be changed manually.

You can get consistent results by putting x_compat=True into the plot function. This will make sure no "ts"-like axes is used independent of the data. It will always result in the second kind of plot.

df.plot(x_compat=True)

The advantage of this is that you can manually change the format of those normal plots via matplotlib.dates formatters and locators.

0
votes

This appears to happen when there are missing values for one column. In the graph to the left all values are present, in the one to the right, there are missing values between 9am and 10am.