How to merge two dataframe based on time intervals and transform them

Question

I have two dataframes, first one is creating by users manually and second one is errors from machines. I want to merge them based on time interval in first dataframe(df_a)

Here are the dataframes;

d_a = {'Station' : ['A1','A2'],
       'Reason_a' : ['Electronic','Feed'],
       'StartTime_a' : ['2019-01-02 02:00:00','2019-01-02 04:22:00'],
       'EndTime_a' : ['2019-01-02 02:20:00', '2019-01-02 04:45:00']}

d_b = {'Station' : ['A1','A1','A1','A2','A2','A2'],
       'Reason_b' : ['a','n','c','d','e','n'],
       'StartTime_b' : ['2019-01-02 00:00:00.000','2019-01-02 00:05:00.000','2019-01-01 23:55:00.000','2019-01-02 04:19:53.000','2019-01-02 04:19:37.000','2019-01-02 04:23:00.000'],
       'EndTime_b' : ['2019-01-02 00:19:15.000','2019-01-02 00:29:45.000','2019-01-02 00:12:12.000','2019-01-02 04:27:12.000','2019-01-02 04:47:16.000','2019-01-02 04:52:45.000']}

df_a = pd.DataFrame(d_a)
df_b = pd.DataFrame(d_b)

Any intersection point of time intervals of two dataframes considered as valid record.

condition1 = df_b start_time start after df_a start time and ends before df_a endtime

condition2 = df_b start_time starts before df_a start time but ends before df_a endtime

condition3 = df_b start_timestarts between df_a starttime and df_a end time but ends after df_a endtime

In the end I want to merge these two dataframes based on conditions. my ideal table looks like below

 Station     Reason_a        a     n     c     d    e 
  A1         Electronic      1     1     1     0    0
  A2          Feed           0     1     0     1    0

How should I approach this problem? Any comment would be helpful.

Thanks in advance.

ivallesp ivallesp · Accepted Answer · 2019-08-15T11:00:03

I would solve it by merging the tables on station and calculating the intersections :D

import numpy as np

df = pd.merge(df_a, df_b, on="Station")

# Convert to date
for datevar in ["StartTime_a", "StartTime_b", "EndTime_a", "EndTime_b"]:
    df[datevar] = pd.to_datetime(df[datevar])

# Intersections definition
df["intersection"] = (((df.StartTime_a > df.StartTime_b) & (df.StartTime_a < df.EndTime_b)) |
                      ((df.StartTime_a < df.StartTime_b) & (df.EndTime_a > df.StartTime_b)))

# Filter only intersections
(df[["Station", "Reason_a", "Reason_b", "intersection"]]
.pivot_table(index=["Station", "Reason_a"], columns="Reason_b", aggfunc=np.sum)
.fillna(0).astype(int))

How to merge two dataframe based on time intervals and transform them

4 Answers