0
votes

I want to merge two DataFrames:

df1:

               dt_object      Lng
1    2020-01-01 00:00:00  1.57423
2    2020-01-01 01:00:00  1.57444
3    2020-01-01 02:00:00  1.57465
4    2020-01-01 03:00:00  1.57486

df2:

               dt_object       Price
0    2020-01-03 10:00:00  256.086667
1    2020-01-03 11:00:00  256.526667
2    2020-01-03 12:00:00  257.386667
3    2020-01-03 13:00:00  256.703333
4    2020-01-03 14:00:00  255.320000

dt_object in both cases has type datetime64

df1 never has missing rows. So it has 24 hours per day.

But df2 HAS missing rows.

When I combine them, there is a mismatch.

df = pd.merge(df1, df2, on = 'dt_object')

Merged df:

               dt_object      Lng       Price
0    2020-04-01 10:00:00  1.59270  183.996667
1    2020-04-01 11:00:00  1.59294  184.466667
2    2020-04-01 12:00:00  1.59319  184.810000
3    2020-04-01 13:00:00  1.59343  184.386667
4    2020-04-01 14:00:00  1.59367  184.533333

Problems:

  1. Lng 1.59270 is in wrong place. It flew in 2020-04-01 10:00:00 from 04.01.2020 10:00:00 (month and date are messed up). But Price 183.996667 is in correct place. So ALL Lng flew from wrong date with messed up Date/Month.
  2. Prices in df2 start from January 2020-01-03 10:00:00, but merged dataframe starts from April 2020-04-01

When I saw this problem, I added for both dataframes:

df1['dt_object'] = pd.to_datetime(df1['dt_object'], format='%Y-%m-%d %H:%M:%S')
df2['dt_object'] = pd.to_datetime(df2['dt_object'], format='%Y-%m-%d %H:%M:%S')

, but it didn't help. Nothing changed. Inside dt_object is a strange bug with month/date, but I cannot detect it. Help me to fix it , please!

1
have you tried adding this?: pd.merge(df1, df2, on = 'dt_object', how = 'left') I think the problem is an inner join, which only keeps rows when there's a match - EMiller

1 Answers

1
votes

You have to specify that you want to execute a left join. The Pandas Documentation explains what the different options for what the how parameter will do.

>>> df1 = pd.DataFrame({'dt_object': pd.date_range('2020-01-01', '2020-01-04'), 'Lng': [0, 1, 2, 3]})
>>> df1
   dt_object  Lng
0 2020-01-01    0
1 2020-01-02    1
2 2020-01-03    2
3 2020-01-04    3

>>> df2 = pd.DataFrame({'dt_object': [pd.Timestamp('2020-01-01'), pd.Timestamp('2020-01-02'), pd.Timestamp('2020-01-04')], 'Price': [1000, 2000, 3000]})
>>> df2
   dt_object  Price
0 2020-01-01   1000
1 2020-01-02   2000
2 2020-01-04   3000

>>> df1.merge(df2, how='left')
   dt_object  Lng   Price
0 2020-01-01    0  1000.0
1 2020-01-02    1  2000.0
2 2020-01-03    2     NaN
3 2020-01-04    3  3000.0