3
votes

Please excuse poor style and inefficient solutions. All help is greatly appreciated.

Context:

Attempting to isolate the best rate of cycling performance gain over a 6-week block over the course of one year. Performance is measured as the maximum effort produced for any given time period for one cycling record, i.e. 1, 5, 20 min effort, etc...

Tasks:

  1. Create rolling window
  2. Best fit trend line for each window
  3. Keep window corresponding to largest positive slope

Data:

ap1 = np.array([[datetime(2015, 10, 17, 12, 45, 13),
   datetime(2015, 10, 18, 11, 56, 35),
   datetime(2015, 10, 20, 9, 24, 52),
   datetime(2015, 10, 23, 9, 27, 12),
   datetime(2015, 10, 24, 12, 26, 33)], 
[281.0, 343.0, 270.0, 312.0, 320.0], 
[246.0, 305.0, 260.0, 283.0, 289.0], 
[236.0, 250.0, 239.0, 257.0, 245.0]], dtype=object)

Issue: I am currently stuck on Task 1. I have been attempting to follow user2689410's response to computing a rolling_mean over irregular time series. I am hoping to grab his data slicing method.

I only want to slice the dataset into rolling intervals of 45 days. Below is the progress:

from pandas import Series, DataFrame
import pandas as pd
from datetime import datetime, timedelta
import numpy as np

idx = ap1[0]
idx = pd.Index(idx)

ap1=np.transpose(ap1)
ap1=pd.DataFrame(ap1, index = idx, columns = ['date', 'cp1', 'cp2', 'cp3'])
ap2=ap1.drop('date', 1)

ap2 = DataFrame(ap2.copy())
idx = Series(ap2.index.to_pydatetime(), index=ap2.index)

for colname, col in ap2.iteritems():
    dslice = col[idx-pd.tseries.frequencies.to_offset('42D').delta:idx]

The for loop gives me the error:

Traceback (most recent call last):
File "<stdin>", line 2, in <module>
File "/usr/local/lib64/python2.7/site-packages/pandas/core/series.py", line 642, in __getitem__
return self._get_with(key)
File "/usr/local/lib64/python2.7/site-packages/pandas/core/series.py", line 647, in _get_with
indexer = self.index._convert_slice_indexer(key, kind='getitem')
File "/usr/local/lib64/python2.7/site-packages/pandas/indexes/base.py", line 1208, in _convert_slice_indexer
indexer = self.slice_indexer(start, stop, step, kind=kind)
File "/usr/local/lib64/python2.7/site-packages/pandas/tseries/index.py", line 1497, in slice_indexer
return Index.slice_indexer(self, start, end, step, kind=kind)
File "/usr/local/lib64/python2.7/site-packages/pandas/indexes/base.py", line 2962, in slice_indexer
kind=kind)
File "/usr/local/lib64/python2.7/site-packages/pandas/indexes/base.py", line 3141, in slice_locs
start_slice = self.get_slice_bound(start, 'left', kind)
File "/usr/local/lib64/python2.7/site-packages/pandas/indexes/base.py", line 3084, in get_slice_bound
slc = self.get_loc(label)
File "/usr/local/lib64/python2.7/site-packages/pandas/tseries/index.py", line 1419, in get_loc
stamp = Timestamp(key, tz=self.tz)
File "pandas/tslib.pyx", line 405, in pandas.tslib.Timestamp.__new__ (pandas/tslib.c:9932)
File "pandas/tslib.pyx", line 1475, in pandas.tslib.convert_to_tsobject (pandas/tslib.c:26432)
TypeError: Cannot convert input to Timestamp

Where do I go from here?

2

2 Answers

2
votes

Nowadays, pandas.DataFrame.rolling can deal with irregular time series.

-1
votes

I found a solution, it is not pretty but works. Please provide feedback for increasing efficiency. This solution provides me with arrays of subarrays corresponding to the moving window for a particular column

 idx = ap2[1]
 idx = pd.Index(idx)

 ap2 = np.transpose(ap2)
 ap2 = pd.DataFrame(ap2, index = idx, columns = ['date', 'cp1', 'cp2', 'cp3'])
 ap2=ap2.drop('date', 1)
 ap2=ap2.astype(float)
 ap2 = DataFrame(ap2.copy())
 dfout = DataFrame()

 idx = Series(ap2.index.to_pydatetime(), index=ap2.index)

 window = '42D'

 idxwindow = idx[idx[0]:idx[len(idx)-1]-pd.tseries.frequencies.to_offset(window).delta]

for i in ap2:
    exec(i +"= []")
for colname, col in ap2.iteritems():
    for i in idxwindow:
        result=col[i:i+pd.tseries.frequencies.to_offset(window).delta]
        result=np.stack((result.index.date, result.values), axis=-1)
        if colname == 'cp1':
            cp1.append(result)
        elif colname == 'cp2':
            cp2.append(result)
        else:
            cp3.append(result)