3
votes

I use OHLC re-sampling of 1min time series data in Pandas, the 15min will work perfectly, for example on the following dataframe:

ohlc_dict = {'Open':'first', 'High':'max', 'Low':'min', 'Close': 'last'}
df.resample('15Min').apply(ohlc_dict).dropna(how='any').loc['2011-02-01']

Date Time             Open         High        Low        Close
------------------------------------------------------------------          
2011-02-01 09:30:00 3081.940    3086.860    3077.832    3081.214

2011-02-01 09:45:00 3082.422    3083.730    3071.922    3073.801

2011-02-01 10:00:00 3073.303    3078.345    3069.130    3078.345

2011-02-01 10:15:00 3078.563    3078.563    3071.522    3072.279

2011-02-01 10:30:00 3071.873    3071.873    3063.497    3067.364

2011-02-01 10:45:00 3066.735    3070.523    3063.402    3069.974

2011-02-01 11:00:00 3069.561    3069.981    3066.286    3069.981

2011-02-01 11:15:00 3070.602    3074.088    3070.373    3073.919

2011-02-01 13:00:00 3074.778    3074.823    3069.925    3069.925

2011-02-01 13:15:00 3070.096    3070.903    3063.457    3063.457

2011-02-01 13:30:00 3063.929    3067.358    3063.929    3067.358

2011-02-01 13:45:00 3067.570    3072.455    3067.570    3072.247

2011-02-01 14:00:00 3072.927    3081.357    3072.767    3080.175

2011-02-01 14:15:00 3078.843    3079.435    3076.733    3076.782

2011-02-01 14:30:00 3076.721    3081.980    3076.721    3081.912

2011-02-01 14:45:00 3082.822    3083.381    3076.722    3077.283

However, when I resample 1min to 1H, the problem comes out. I use default setting, and find the time start from 9 am, but the markert open at 9:30 am.

df.resample('1H').apply(ohlc_dict).dropna(how='any').loc['2011-02-01']

1HourOHLC Wrong in Morning

I then try to change the base setting, but fail in the afternoon session. The market should open at 13 pm and end at 15 pm, so there should be 13 pm, 14 pm, 15 pm, total 3 bars.

df.resample('60MIN',base=30).apply(ohlc_dict).dropna(how='any').loc['2011-02-01']

1HourOHLC Wrong in afternoon

In conclusion, the problem is I want it fitting in market and has 6 bars (9:30,10:30,11:30,1:00,2:00,3:00), but resample in pandas only give me 5 bars (9:30,10:30,11:30,1:30,2:30)

I am searching for a long time on net. But no use. Please help or try to give some ideas how to achieve this. Thanks.

2
I don't think you can accomplish this, since resample is in fixed interval. You can try to resample into 2 separate dfs, then slice the dfs and join them together.Yeile

2 Answers

1
votes

I had the same issue and could'nt find help online. So i wrote this script to convert 1 min OHLC data into 1 hour.

This assumes market timings 9:15am to 3:30pm. If market timings are different simply edit the start_time and end_time to suit your needs.

I havent put any additional checks in case trading was suspended during market hours.

Hope the code is helpful to someone. :)

Sample csv format

Date,O,H,L,C,V
2020-03-12 09:15:00,3860,3867.8,3763.35,3830,58630
2020-03-12 09:16:00,3840.05,3859.4,3809.65,3834.6,67155
2020-03-12 09:17:00,3832.55,3855.4,3823.75,3852,51891
2020-03-12 09:18:00,3851.65,3860.95,3846.35,3859,42205
2020-03-12 09:19:00,3859.45,3860,3848.1,3851.55,33194

Code

from pandas import read_csv, to_datetime, DataFrame
from datetime import time

file_path = 'BAJFINANCE-EQ.csv'


def add(data, b):
    # utility function
    # appends the value in dictionary 'b'
    # to corresponding key in dictionary 'data'
    for (key, value) in b.items():
        data[key].append(value)


df = read_csv(file_path,
              parse_dates=True,
              infer_datetime_format=True,
              na_filter=False)

df['Date'] = to_datetime(df['Date'], format='%Y-%m-%d %H:%M:%S')

# stores hourly data to convert to dataframe
data = {
    'Date': [],
    'O': [],
    'H': [],
    'L': [],
    'C': [],
    'V': []
}

start_time = [time(9, 15), time(10, 15), time(11, 15), time(
    12, 15), time(13, 15), time(14, 15), time(15, 15)]

end_time = [time(10, 14), time(11, 14), time(12, 14), time(
    13, 14), time(14, 14), time(15, 14), time(15, 29)]


# Market timings 9:15am to 3:30pm (6 hours 15 mins)
# We create 6 hourly bars and one 15 min bar
# as usually depicted in candlestick charts
i = 0
no_bars = df.shape[0]
while i < no_bars:

    if df.loc[i]['Date'].time() in end_time:
        end_idx = i + 1

        hour_df = df[start_idx:end_idx]

        add(data, {
            'Date': df.loc[start_idx]['Date'],
            'O':    hour_df['O'].iloc[0],
            'H':    hour_df['H'].max(),
            'L':    hour_df['L'].min(),
            'C':    hour_df['C'].iloc[-1],
            'V':    hour_df['V'].sum()
        })

    if df.loc[i]['Date'].time() in start_time:
        start_idx = i

        # optional optimisation for large datasets
        # skip ahead to loop faster
        i += 55

    i += 1


df = DataFrame(data=data).set_index(keys=['Date'])
# df.to_csv('out.csv')
print(df)
0
votes

Here's a part of answer for only Close in dataframe. And resample in pandas may not fullfill my original intention as Yelie said. Therefore, I try to extract the desired item by iterrows.

from datetime import datetime
from datetime import timedelta

def extract(df):
    data = pd.DataFrame()
    for index, row in df.iterrows():
        if index.to_pydatetime().minute == 30 and index.to_pydatetime().hour < 12 :
            data = data.append(row)
        elif index.to_pydatetime().minute == 0 and index.to_pydatetime().hour > 12 :
            data = data.append(row)
        elif index.to_pydatetime().minute == 29 and index.to_pydatetime().hour == 11 :
            row = row = row.rename(index.to_pydatetime() + timedelta(minutes = 1))
            data = data.append(row)
        elif index.to_pydatetime().minute == 59 and index.to_pydatetime().hour == 14 :
            row = row = row.rename(index.to_pydatetime() + timedelta(minutes = 1))
            data = data.append(row)
    return data

data = extract(df.loc['2011-02-01'])
data

However, the other items is not correct except close. The result is showed below:

Close                             High        Low         Open        Volume       turnover
2011-02-01 09:30:00 3081.940    3081.940    3081.940    3081.940    74767100.0  996328900.0
2011-02-01 10:30:00 3071.873    3071.873    3071.873    3071.873    18754100.0  250694100.0
2011-02-01 11:30:00 3073.919    3073.919    3073.919    3073.919    13762700.0  179169200.0
2011-02-01 13:00:00 3074.778    3074.778    3074.778    3074.778    25992700.0  321678500.0
2011-02-01 14:00:00 3072.927    3072.927    3072.927    3072.927    11682300.0  161534600.0
2011-02-01 15:00:00 3077.283    3077.283    3077.283    3077.283    68184500.0  930561900.0