I have two DataFrames in pandas:
import pandas as pd
df1 = pd.DataFrame({'Name': ["A", "B", "C", "C","D","D","E"],
'start': [50, 124, 1, 159, 12, 26,110],
'stop': [60, 200, 19, 200, 24, 30,160]})
df2 = pd.DataFrame({'Name': ["B", "C","D","E"],
'start': [126, 143, 19, 159],
'stop': [129, 220, 27, 200]})
print(df1)
Name start stop
0 A 50 60
1 B 124 200
2 C 1 19
3 C 159 200
4 D 12 24
5 D 26 30
6 E 110 160
print(df2)
Name start stop
0 B 126 129
1 C 143 220
2 D 19 27
3 E 159 200
I want to filter df1 to remove rows based on df2 using the following criteria:
- Name should be present in both df1 and df2
- The range from start to stop for a Name overlaps with the range from start to stop for that Name in the other DataFrame
This would give:
Name start stop
0 B 124 200
1 C 159 200
2 D 12 24
3 D 26 30
4 E 110 160
Where:
- A has been dropped as there is no A in df2
- B is kept as the start and stop of B in df2 are nested in those of B in df1
- One of the C's of df1 has been dropped as its values didn't overlap with df2, whereas the other was kept as it is nested in the start and stop range of C in df2
- Both D's are kept as both have an overlap with the range of D in df2
- E is kept as its range overlaps with E in df2
Any help would be greatly appreciated!