1
votes

I'm new to pandas. Given two data frames:

df_1

product_id product_price invoice_total
p1 100 200
p2 200 300
p3 300 600
p4 400 700

df_2

product_id quantity invoice_total
p1 8 700
p6 3 900
p2 5 600

I want to check if the product id in df1 is similar to that of df2 and if so, pick the value of invoice total in df2.

I've tried the for loop:

df_new = pd.DataFrame()
for i in df1.product_id:
   for j in df2.product_id:
      if i == j:
         # return the value of df2.invoice_total and append to the df_new.
         

But I believe there's a better way.

The result should be something like this:

product_id invoice_total
p1 700
p2 600
2
What do you mean by similar? same product_id? if so, you can use merge - gtomer
use pd.merge() to get this done - Joe Ferndz

2 Answers

0
votes

You can use a dataframe merge for this

import pandas as pd

df_1 = pd.DataFrame({
    'product_id': ['p1', 'p2', 'p3', 'p4'],
    'product_price': [100, 200, 300, 400],
    'invoice_total': [200, 300, 600, 700]
})

df_2 = pd.DataFrame({
    'product_id': ['p1', 'p6', 'p2'],
    'quantity': [8, 3, 5],
    'invoice_total': [700, 900, 600]
})

df_merged = df_1.merge(
    df_2,
    on='product_id',
    suffixes=('_df1', '')
)

Contents of df_merged

  product_id  product_price  invoice_total_df1  quantity  invoice_total
0         p1            100                200         8            700
1         p2            200                300         5            600

Then filter to only the columns you need

df_merged = df_merged[['product_id', 'invoice_total']]

Final contents of df_merged

  product_id  invoice_total
0         p1            700
1         p2            600
0
votes

I am assuming you want to do a "left join".

df_new = df1.join(df2, how = 'left')

then in df_new you can select the 'invoice_total' column and make a list of it by:

df_new['invoice_total'].values.tolist()

let me know if this is what you were looking for ;).