5
votes

I have two dataframes, "bio" with monthly data, where I only need the column bio['Bio_Avg']. And another one "tc" with daily data. Both with Datetime index.

What I need is to merge both keeping daily granularity but adding monthly data repeating the value of the column bio['Bio_Avg'].

print(bio.head(5))
                        year  month   ncena    ...      contingency  D   Bio_Avg
            Date                              ...                              
            2019-12-01  2019   12       NaN    ...              NaN  1  38826.48
            2019-11-01  2019   11       NaN    ...              NaN  1  33867.68
            2019-10-01  2019   10       NaN    ...              NaN  1  31358.80
            2019-09-01  2019    9       NaN    ...              NaN  1  29583.84
            2019-08-01  2019    8       NaN    ...              NaN  1  27763.12

and

print(tc.head(5))
                               exchg_rate
            2019-11-01            59.71
            2019-11-02            59.74
            2019-11-03            59.73
            2019-11-04            59.70
            2019-11-05            59.65

The expected result should be:

                               exchg_rate   Bio_Avg
            2019-11-01            59.71     33867.68
            2019-11-02            59.74     33867.68
            2019-11-03            59.73     33867.68
            2019-11-04            59.70     33867.68
            2019-11-05            59.65     33867.68

I've tried:

merge = pd.merge(tc, bio, left_index=True, right_index=True)
print(merge.head(5))

And I get:

                               exchg_rate year month  ...      contingency  D   Bio_Avg
            2019-11-01            59.75  2019   11    ...              NaN  1  33867.68
            2019-12-01            59.94  2019   12    ...              NaN  1  38826.48

Off course it keeps only the exact match with the date, but I loose the days in between. What could be an elegant way to solve this without breaking all dataframes?

1

1 Answers

3
votes

first we will reset the index on df2 and then merge on your month column, if the month column wasn't present we could use df['Date'].dt.month

#df1 = bio

#df2 = tc

#df2.reset_index(inplace=True)
df2 = df2.reset_index()


merged_df = pd.merge(
    df2,
    df[["month", "Bio_Avg"]],
    left_on=df2['Date'].dt.month,
    right_on="month"

).drop("month", axis=1)

print(merged_df)

        Date  exchg_rate   Bio_Avg
0 2019-11-01       59.71  33867.68
1 2019-11-02       59.74  33867.68
2 2019-11-03       59.73  33867.68
3 2019-11-04       59.70  33867.68
4 2019-11-05       59.65  33867.68

Edit:

working with multiple columns to join on, instead of of using your pre-built month + year column, we can extract them from the actual Date column ensure it's a date time by using pd.to_datetime(your_dfs['Date'])

    final =pd.merge(
    bio[["Bio_Avg", "Date"]],
    tc,
    left_on=[bio["Date"].dt.year, bio["Date"].dt.month],
    right_on=[tc["Date"].dt.year, tc["Date"].dt.month],
    how="right",
    suffixes=("bio", "tc_"),
).drop(["key_0", "key_1"], axis=1)

print(final)

    Bio_Avg    Datebio    Datetc_  exchg_rate
0  33867.68 2019-11-01 2019-11-01       59.71
1  33867.68 2019-11-01 2019-11-02       59.74
2  33867.68 2019-11-01 2019-11-03       59.73
3  33867.68 2019-11-01 2019-11-04       59.70
4  33867.68 2019-11-01 2019-11-05       59.65