0
votes

I'm trying to split time ranges that overlap multiple hours of the day into two rows, each representing the time in each hour (and leave time ranges where the start and end time are both within the same hour of the day alone).

Example dataframe:

df1 = pd.DataFrame({'id':['A','B','C'],
                   'start_time':['2021-03-01 11:52:11+00:00','2021-03-01 11:53:10+00:00','2021-03-01 12:01:54+00:00'],
                   'end_time':['2021-03-01 12:14:45+00:00','2021-03-01 12:09:40+00:00','2021-03-01 12:57:03+00:00']})
df1['start_time'] = pd.to_datetime(df1['start_time']) 
df1['end_time'] = pd.to_datetime(df1['end_time']) 

Which would look like:

--------------------------------------------------------------------------
index    id   start_time                     end_time
0        A    2021-03-01 11:52:11+00:00      2021-03-01 12:14:45+00:00
1        B    2021-03-01 11:53:10+00:00      2021-03-01 12:09:40+00:00
2        C    2021-03-01 12:01:54+00:00      2021-03-01 12:57:03+00:00
--------------------------------------------------------------------------

And I'm hoping for a result like:

--------------------------------------------------------------------------
index    id   start_time                     end_time
0        A    2021-03-01 11:52:11+00:00      2021-03-01 12:00:00+00:00
1        A    2021-03-01 12:00:00+00:00      2021-03-01 12:14:45+00:00
2        B    2021-03-01 11:53:10+00:00      2021-03-01 12:00:00+00:00
3        B    2021-03-01 12:00:00+00:00      2021-03-01 12:09:40+00:00
4        C    2021-03-01 12:01:54+00:00      2021-03-01 12:57:03+00:00
--------------------------------------------------------------------------
1

1 Answers

0
votes
def myfunc(sr):
    """Return all pairwise hours between start_time and end_time."""
    dtl = pd.date_range(sr["start_time"].floor("H"),
                        sr["end_time"].ceil("H"),
                        freq="H").tolist()
    dtl[0] = sr["start_time"]
    dtl[-1] = sr["end_time"]
    return list(zip(dtl[0:-1], dtl[1:]))

# One row -> (start, next_hour) or (curr_hour, next_hour) or (curr_hour, end)
pairs = df1.apply(myfunc, axis="columns").explode()

# Build output dataframe from list 'pairs'
out = pd.DataFrame(pairs.tolist(), 
                   columns=["start_time", "end_time"],
                   index=pairs.index)

# Merge with 'df1'
out = df1[["id"]].join(out).reset_index(drop=True)
>>> out
  id                start_time                  end_time
0  A 2021-03-01 11:52:11+00:00 2021-03-01 12:00:00+00:00
1  A 2021-03-01 12:00:00+00:00 2021-03-01 12:14:45+00:00
2  B 2021-03-01 11:53:10+00:00 2021-03-01 12:00:00+00:00
3  B 2021-03-01 12:00:00+00:00 2021-03-01 12:09:40+00:00
4  C 2021-03-01 12:01:54+00:00 2021-03-01 12:57:03+00:00