1
votes

I have two data frames one contains the data second contains codes and their decode values. i want to match df1[code] with df2[code] and paste df2[value] in df1. it should be noted that my second data frame contains code and values once, its basically a sheet of codes and values but in first data frame the codes are repeating so the values column which will be pasted should represent the value every time when the code appears in df1[code] column.

df1[code] df2[code] df2[value]
234 000 Three
235 234 Two
238 238 Four
337 235 Five

i need as following:

df1[code] df1[value]
234 Two
235 Five
238 Four
337 Null

basically translation of codes in one data frame from second data frame.

2
look at merge if you have 2 dataframes df1 and df2 , df1.merge(df2,on='Code') . - anky
could you show your attempted codes? - ABC
merge is not working as df1 has many columns so it gives error of datatypes. - i202075 Mirza Faraz Hassan
i have tried to convert df2 into dictionary as well . but then the matching doesnt takes place with df1[code] and all printed values are null\ - i202075 Mirza Faraz Hassan

2 Answers

1
votes

Suppose that you dataframes are the following ones:

df1
   code     something  some_number
0   210  SOMETHING_28     0.206017
1   913  SOMETHING_36     0.810195
2   210  SOMETHING_18     0.258638
3  None             a     0.000000
df2
   code     value
0   210  VALUE_01
1   590  VALUE_02
2   614  VALUE_03
3   696  VALUE_04
4   913  VALUE_05

Then, you can use merge, changing the type of the code column, if needed (e.g., if it is a string):

df1.code = df1.code.map(lambda x: np.int64(x) if x else np.nan).astype('Int64')
df2.code = df2.code.astype('Int64')
merged_df = df1.merge(df2, on='code', how='left')

And you get:

   code     value
0   210  VALUE_01
1   913  VALUE_05
2   210  VALUE_01
3  <NA>       NaN

Here the code to create df1 and df2 with the same structure as the ones shown in this answer:

import pandas as pd
import numpy as np

codes = sorted(np.random.randint(1, 1000, 5))
values = [f'VALUE_{x:02.0f}' for x in range(1, len(codes) + 1)]

df1 = pd.DataFrame(
    data=[
        [c, f'SOMETHING_{np.random.randint(1, 50)}', np.random.random()]
        for c in np.random.choice(codes, 3)
    ],
    columns=['code', 'something', 'some_number']
)

df2 = pd.DataFrame(
    data=list(zip(codes, values)),
    columns=['code', 'value']
)
0
votes

How about use a map-dict:

map_dict = dict(zip(df2['code'], df2['value']))
df1['value'] = df1['code'].map(map_dict)