2
votes

I have two data frames (df1 and df2), I am planning to merge df1 and df2 based on a columns 'pin', 'pin1', pin2' which are in df1 on column 'pincode2' in df2.

Structure:

              df1                             df2

ID Name pin   pin1  pin2  pin3            pincode  value
1  A1   1001  4521  3223  6433            1001     W1
2  A5   1002        4556                  1002     O1
3  X3   4531  4341        4563            1003     X3
4  A4   4633  4633  4633  4633            1004     Z2
5  B3   5346              3452            1005     Y9
6  B4   1001                              1006     G4
7  F5               6512                  1007     R3

Now I am using the following code to perform merge in pandas:

# Importing libraries 

import pandas as pd
import cvs as csv
#********************************************************************#

# Importing the files

df1= pd.read_csv('/Users/one/Input/df1.csv', encoding = "ISO-8859-1")
df2= pd.read_csv('/Users/one/Input/df2.csv', encoding = "ISO-8859-1")

#*************************************************************#   

# Creating a column as a flag

df1['mapping_type']= 'Unmapped'

#*************************************************************# 

# Merging/Joining the two data frame

df_first_run = pd.merge(left=df1, right=df2, left_on= 'pin', right_on= 'pincode', how='left')

As you can see, I need repeat this merge with column 'pin1' 'pin2' of the df1. However I don't want to keep changing the column names df2 or else I will replace the previous got values.

Also the column mapped_type must change to show what column was mapped.

Expected Output:

    ID Name pin   pin1  pin2  pin3   value  mapping_type
    1  A1   1001  4521  3223  6433   W1     pin
    2  A5   1002        4556         O2     pin
    3  X3   4531  4341        4563   P3     pin3
    4  A4   4633  4633  4633  4633   Q2     pin2
    5  B3   5346              3452          Unmapped
    6  B4   1001                     W1     pin
    7  F5               6512         Q2     pin2
1

1 Answers

1
votes

You could use the map function instead of merging.

Just turn df2 into a hash and map each series separately.

mapping = df2.to_dict()['value']
for col in ['pin', 'pin1', 'pin2', 'pin3']:
    df1['{}_value'.format(col)] = df1[col].map(mapping)

This will with your example data produce this:

   Name   pin  pin1  pin2  pin3 pin_value pin1_value pin2_value pin3_value
ID                                                                        
1    A1  1001  4521  3223  6433        W1        NaN        NaN        NaN
2    A5  1002  4556   NaN   NaN        O1        NaN        NaN        NaN
3    X3  4531  4341  4563   NaN       NaN        NaN        NaN        NaN
4    A4  4633  4633  4633  4633       NaN        NaN        NaN        NaN
5    B3  5346  3452   NaN   NaN       NaN        NaN        NaN        NaN
6    B4  1001   NaN   NaN   NaN        W1        NaN        NaN        NaN
7    F5  6512   NaN   NaN   NaN       NaN        NaN        NaN        NaN

Which is not exactly what you asked for, but maybe good enough.

If you want to have exactly what you asked for, it gets a bit more complicated:

mapping = df2.to_dict()['value']
for col in ['pin', 'pin1', 'pin2', 'pin3']:
    mappedvalues = df1[col].map(mapping)
    hasmapping = pd.notnull(mappedvalues)
    df1.loc[hasmapping, 'value'] = mappedvalues
    df1.loc[hasmapping, 'mapping'] = col

Gives

   Name   pin  pin1  pin2  pin3  value mapping
ID                                            
1    A1  1001  4521  3223  6433     W1     pin
2    A5  1002  4556   NaN   NaN     O1     pin
3    X3  4531  4341  4563   NaN    NaN        
4    A4  4633  4633  4633  4633    NaN        
5    B3  5346  3452   NaN   NaN    NaN        
6    B4  1001   NaN   NaN   NaN     W1     pin
7    F5  6512   NaN   NaN   NaN    NaN        

However, it should be noted that the last column to change the value will be the column mentioned in 'mapping'. This feels like bad practice and I would probably go with the example above.