I have given two dataframes.
Dataframe 1:
| Date_DF1 | Event | Event2 | Event3 |
|---|---|---|---|
| 2021-01-01 | Nan | PandemicHoliday | NaN |
| 2021-02-01 | Nan | PandemicHoliday | NaN |
| 2021-03-01 | Nan | PandemicHoliday | NaN |
| 2021-04-02 | SpecialDay | NaN | NaN |
| 2021-14-02 | SpecialDay | PandemicHoliday | NaN |
The first dataframe is a .csv file that includes all holidays between 2017-2021 years. Date column is datetime format. If there is more than one holiday on the same day, the name of the holiday is written in all of the Event, Event1 and Event2 columns. Event, Event1 and Event2 columns include SpecialDay, PandemicHoliday, NationalHoliday values (3 types of holiday).
Dataframe 2:
| Date_DF2 | OrderTotal | OrderID |
|---|---|---|
| 2021-01-01 | 68.5 | 31002 |
| 2021-01-01 | 56.5 | 31003 |
| 2021-01-01 | 98.5 | 31004 |
| 2021-01-02 | 78.5 | 31005 |
The second dataframe contains the daily order frequency. Date columns is datetime format.
Not all dates in df2 exist in df1.
I want to add the Event, Event1 and Event2 columns in the first table to the second table. The second table contains more than one column from the same date. Each holiday will be added to the second table as a column. How can I do this in python? Result table will look like this:
| Date | OrderTotal | OrderID | SpecialDay | PandemicHoliday | NationalHoliday |
|---|---|---|---|---|---|
| 2021-01-01 | 68.5 | 31002 | 0 | 1 | 0 |
| 2021-01-01 | 68.5 | 31003 | 0 | 1 | 0 |
| 2021-01-01 | 68.5 | 31004 | 0 | 1 | 0 |
| 2021-01-02 | 78.5 | 31005 | 1 | 0 | 0 |