1
votes

When plotting a time series with pandas using dates, the plot is completely wrong, as are the dates along the x-axis. For some reason the data are plotted against dates not even in the dataframe.

This is for plotting multiple sensors with independent clocks and different sampling frequencies. I want to plot all sensors in the same figure for comparison.

I have tried sorting the dataframe in ascending order, and assigning the datetime column as the dataframe index without effect. When plotting the data set against the timestamp instead, plots for each sensor look fine.

Excerpt from a typical CSV file:

    Timestamp Date Clock DC3 HR DC4
    13 18.02.2019 08:24:00  19,12   61  3
    14 18.02.2019 08:26:00  19,12   38  0
    15 18.02.2019 08:28:00  19,12   52  0
    16 18.02.2019 08:30:00  19,12   230 2
    17 18.02.2019 08:32:00  19,12   32  3

The following code produces the problem for me:

import pandas as pd
from scipy.signal import savgol_filter

columns = ['Timestamp', 'Date', 'Clock', 'DC3', 'HR', 'DC4']

data = pd.read_csv('Exampledata.DAT', 
               sep='\s|\t', 
               header=19, 
               names=columns, 
               parse_dates=[['Date', 'Clock']], 
               engine='python')

data['HR'] = savgol_filter(data['HR'], 201, 3) #Smoothing

ax = data.plot(x='Date_Clock', y='HR', label='Test')

The expected result should look like this only with dates along the x-axis:

Imgur

The actual result is: Imgur

An example of a complete data file can be downloaded here: https://filesender.uninett.no/?s=download&token=ae8c71b5-2dcc-4fa9-977d-0fa315fedf45

How can this issue be addressed?

2

2 Answers

1
votes

This issue is resolved by not using parse_dates when loading the file, but instead creating the datetime vector like this:

import pandas as pd
from scipy.signal import savgol_filter

columns = ['Timestamp', 'Date', 'Clock', 'DC3', 'HR', 'DC4']

data = pd.read_csv('Exampledata.DAT', 
               sep='\s|\t', 
               header=19, 
               names=columns, 
               engine='python')

data['Timestamp'] = pd.to_datetime(data['Date'] + data['Clock'], 
format='%d.%m.%Y%H:%M:%S')

data['HR'] = savgol_filter(data['HR'], 201, 3) #Smoothing

ax = data.plot(x='Timestamp', y='HR', label='Test')

This creates the following plot:

Imgur

Which is the plot I want.

0
votes

You get a weird graph because matplotlib plots one dot per row. If you want to have a graph that is easier to read you could use the resample() function to group your entries to have 1 per day (or 1 per week or 1 per month if you prefer). You two main options when resampling, you can either choose to take the sum of all the entries, or you can take the mean. I decided arbitrarily to take the mean.

Here is what it could look like :

#Loading in the csv file
filename = 'data_test.xlsx'
df1 = pd.read_excel(filename, sep=',', index_col=False, header =None)
df1.columns =  ['to_delete', 'Timestamp', 'DC3', 'HR', 'DC4', 'DC5']
df1.drop(columns = 'to_delete', inplace = True)
df1['Timestamp'] = [datetime.strptime(x, '%d.%m.%Y %H:%M:%S') for x in df1['Timestamp']]

# We put the timestamp in the index since it's needed by the resample function
df1 = df1.set_index(["Timestamp"])
# We resample to have one row per day
df1 = df1.resample("1d").mean()

#We plot the graph
x = df1.plot(y='HR', label='Test')

Here is the graph with resampling :

with_resample

To compare here is the graph without resampling :

enter image description here