How to Combine Two Dataframes By Date in Python

Question

I have given two dataframes.

Dataframe 1:

Date_DF1	Event	Event2	Event3
2021-01-01	Nan	PandemicHoliday	NaN
2021-02-01	Nan	PandemicHoliday	NaN
2021-03-01	Nan	PandemicHoliday	NaN
2021-04-02	SpecialDay	NaN	NaN
2021-14-02	SpecialDay	PandemicHoliday	NaN

The first dataframe is a .csv file that includes all holidays between 2017-2021 years. Date column is datetime format. If there is more than one holiday on the same day, the name of the holiday is written in all of the Event, Event1 and Event2 columns. Event, Event1 and Event2 columns include SpecialDay, PandemicHoliday, NationalHoliday values (3 types of holiday).

Dataframe 2:

Date_DF2	OrderTotal	OrderID
2021-01-01	68.5	31002
2021-01-01	56.5	31003
2021-01-01	98.5	31004
2021-01-02	78.5	31005

The second dataframe contains the daily order frequency. Date columns is datetime format.

Not all dates in df2 exist in df1.

I want to add the Event, Event1 and Event2 columns in the first table to the second table. The second table contains more than one column from the same date. Each holiday will be added to the second table as a column. How can I do this in python? Result table will look like this:

Date	OrderTotal	OrderID	SpecialDay	PandemicHoliday
2021-01-01	68.5	31002	0	1
2021-01-01	68.5	31003	0	1
2021-01-01	68.5	31004	0	1
2021-01-02	78.5	31005	1	0

perl perl · Accepted Answer · 2021-05-10T15:19:29

You can one-hot-encode df1 with pd.get_dummies, then merge:

df2.merge(
    pd.get_dummies(df1.set_index('Date_DF1').stack()).sum(level=0),
    left_on='Date_DF2',
    right_index=True,
    how='left').fillna(0)

Output:

    Date_DF2  OrderTotal  OrderID  PandemicHoliday  SpecialDay
0 2021-01-01        68.5    31002                1           0
1 2021-01-01        56.5    31003                1           0
2 2021-01-01        98.5    31004                1           0
3 2021-01-02        78.5    31005                0           1

How to Combine Two Dataframes By Date in Python

1 Answers