1
votes

I have a pandas dataframe with the following structure:

                     Date     Open     High      Low    Close  Volume
0     2003-10-01 00:00:00  1.16500  1.16700  1.16400  1.16690    1125
1     2003-10-01 01:00:00  1.16680  1.16790  1.16600  1.16720     933
............

Those are continuos time values because it is Eur/Usd data. I want to resample this creating a daily dataframe that uses as Open value the value on Open column for Date XXXX-XX-XX 09:00:00 and for Close value the Close column value for XXXX-XX-XX 16:00:00. High and low should be the higher high and the lower Low between XXXX-XX-XX 09:00:00 and XXXX-XX-XX 16:00:00. Volume should be the sum of volumes between XXXX-XX-XX 09:00:00 and XXXX-XX-XX 16:00:00. Is there an easy way to do that in pandas? How to?

Thanks

3
You have to rephrase your question. - Abbas
I have a 24 Hour financial time series OHLC and volume data with 1 hour timeframe, I Want to resample it to daily timeframe but using only data between 09:00:00 and 17:00:00 - Diego Di Tommaso

3 Answers

1
votes

Only times between 09:00:00 and 16:00:00.

between_time is a nice simple way to get the times you need

ts = ts.between_time('9:00','16:00')

Resample using recommended syntax:

and to avoid the 'future warning' with your implementation of resample try this:

ohlc_dict = {                                                                                                             
'Open':'first',                                                                                                    
'High':'max',                                                                                                       
'Low':'min',                                                                                                        
'Close': 'last',                                                                                                    
'Volume': 'sum'  }

dailyData = ts.resample('1d').agg(ohlc_dict)
1
votes

This is a two-step process. First you need to drop data that's outside of your daily hourly bounds; then you need to resample that to daily frequency.

Suppose this our timeseries:

import pandas as pd
import numpy as np
ts = pd.Series(np.random.random(72), index=pd.date_range('1/1/2011', periods=72, freq='H'))

To filter by hour, we can create a boolean array asking each timestamp in our data whether or not its hours is within our hours of interest, then use that to index our timeseries:

ts_filtered = ts[ts.index.map(lambda time: 9 <= time.hour <= 17)]

Then, to resample, just use resample:

daily_stats = ts_filtered.resample('D').mean()

Which gets us:

2011-01-01    0.507943
2011-01-02    0.416317
2011-01-03    0.573760
Freq: D, dtype: float64
1
votes

Thanks, I also found this solution:

ohlc_dict = {                                                                                                             
'Open':'first',                                                                                                    
'High':'max',                                                                                                       
'Low':'min',                                                                                                        
'Close': 'last',                                                                                                    
'Volume': 'sum'  }

df_filtered_daily = df_filtered.resample('D', how=ohlc_dict, closed='left', label='left')