0
votes

I have given two dataframes.

Dataframe 1:

Date_DF1 Event Event2 Event3
2021-01-01 Nan PandemicHoliday NaN
2021-02-01 Nan PandemicHoliday NaN
2021-03-01 Nan PandemicHoliday NaN
2021-04-02 SpecialDay NaN NaN
2021-14-02 SpecialDay PandemicHoliday NaN

The first dataframe is a .csv file that includes all holidays between 2017-2021 years. Date column is datetime format. If there is more than one holiday on the same day, the name of the holiday is written in all of the Event, Event1 and Event2 columns. Event, Event1 and Event2 columns include SpecialDay, PandemicHoliday, NationalHoliday values (3 types of holiday).

Dataframe 2:

Date_DF2 OrderTotal OrderID
2021-01-01 68.5 31002
2021-01-01 56.5 31003
2021-01-01 98.5 31004
2021-01-02 78.5 31005

The second dataframe contains the daily order frequency. Date columns is datetime format.

Not all dates in df2 exist in df1.

I want to add the Event, Event1 and Event2 columns in the first table to the second table. The second table contains more than one column from the same date. Each holiday will be added to the second table as a column. How can I do this in python? Result table will look like this:

Date OrderTotal OrderID SpecialDay PandemicHoliday NationalHoliday
2021-01-01 68.5 31002 0 1 0
2021-01-01 68.5 31003 0 1 0
2021-01-01 68.5 31004 0 1 0
2021-01-02 78.5 31005 1 0 0
1

1 Answers

2
votes

You can one-hot-encode df1 with pd.get_dummies, then merge:

df2.merge(
    pd.get_dummies(df1.set_index('Date_DF1').stack()).sum(level=0),
    left_on='Date_DF2',
    right_index=True,
    how='left').fillna(0)

Output:

    Date_DF2  OrderTotal  OrderID  PandemicHoliday  SpecialDay
0 2021-01-01        68.5    31002                1           0
1 2021-01-01        56.5    31003                1           0
2 2021-01-01        98.5    31004                1           0
3 2021-01-02        78.5    31005                0           1