Comparing columns of two Data Frames and returning the values of a different column using Pandas

Question

I'm new to pandas. Given two data frames:

df_1

product_id	product_price	invoice_total
p1	100	200
p2	200	300
p3	300	600
p4	400	700

df_2

product_id	quantity	invoice_total
p1	8	700
p6	3	900
p2	5	600

I want to check if the product id in df1 is similar to that of df2 and if so, pick the value of invoice total in df2.

I've tried the for loop:

df_new = pd.DataFrame()
for i in df1.product_id:
   for j in df2.product_id:
      if i == j:
         # return the value of df2.invoice_total and append to the df_new.

But I believe there's a better way.

The result should be something like this:

product_id	invoice_total
p1	700
p2	600

What do you mean by similar? same product_id? if so, you can use merge — gtomer

Kurt Kline Kurt Kline · Accepted Answer · 2021-02-05T09:04:14

You can use a dataframe merge for this

import pandas as pd

df_1 = pd.DataFrame({
    'product_id': ['p1', 'p2', 'p3', 'p4'],
    'product_price': [100, 200, 300, 400],
    'invoice_total': [200, 300, 600, 700]
})

df_2 = pd.DataFrame({
    'product_id': ['p1', 'p6', 'p2'],
    'quantity': [8, 3, 5],
    'invoice_total': [700, 900, 600]
})

df_merged = df_1.merge(
    df_2,
    on='product_id',
    suffixes=('_df1', '')
)

Contents of df_merged

  product_id  product_price  invoice_total_df1  quantity  invoice_total
0         p1            100                200         8            700
1         p2            200                300         5            600

Then filter to only the columns you need

df_merged = df_merged[['product_id', 'invoice_total']]

Final contents of df_merged

  product_id  invoice_total
0         p1            700
1         p2            600

Comparing columns of two Data Frames and returning the values of a different column using Pandas

2 Answers