2
votes

I have two dataframes, df1 and df2. They unique identifiers for each dataframe are 'ID' and 'Prop_Number'. I need to copy over the Num1, 2 and 3 columns from df1 into the corresponding columns in df2, 1_Num... But I'm not sure how to do a merge for more than column. I want to keep df2 as df2, not create a new df (because my real data has more columns in df2) that will remain as is.

cols1 = ['ID', 'Num1', 'Num2', 'Num3']
data1 = [['33', '.853', '9834', '234'],
        ['87', '.372', '2345', '843'],
        ['15', '1.234','742', '821'],
        ['92', '1.957', '1234', '123'],
        ['13', '.943', '8427', '493'],
        ['67', '.852', '3421', '439']
       ]
df1 = pd.DataFrame(data=data1, columns=cols1)

cols2 = ['Prop_Number', '1_Num', '2_Num', '3_Num']
data2 = [['87', '', '', ''],
        ['33', '', '', ''],
        ['67', '','', ''],
        ['13', '', '', ''],
        ['92', '', '', ''],
        ['15', '', '', '']
       ]
df2 = pd.DataFrame(data=data2, columns=cols2)

What I've tried is

df2['1_Num'] = np.where(df1['ID'] == df2['Prop_Number'], df1['Num1'],np.nan)

2
are column names the same in both dfs or different?Umar.H
column names are different but you can relate them - df1['Num1'] = df2['1_Num']...DeAnna Martinez

2 Answers

2
votes

You can try this:

cols1 = ['ID', 'Num1', 'Num2', 'Num3']
data1 = [['33', '.853', '9834', '234'],
        ['87', '.372', '2345', '843'],
        ['15', '1.234','742', '821'],
        ['92', '1.957', '1234', '123'],
        ['13', '.943', '8427', '493'],
        ['67', '.852', '3421', '439']
       ]
df1 = pd.DataFrame(data=data1, columns=cols1)

cols2 = ['Prop_Number', '1_Num', '2_Num', '3_Num']
data2 = [['87', '', '', ''],
        ['33', '', '', ''],
        ['67', '','', ''],
        ['13', '', '', ''],
        ['92', '', '', ''],
        ['15', '', '', '']
       ]
df2 = pd.DataFrame(data=data2, columns=cols2)

df2 = df2.set_index('Prop_Number')
df2.update(df1.rename(columns=dict(zip(df1.columns[1:],
                                       ['1_Num','2_Num','3_Num'])))
              .set_index('ID'))
df2 = df2.reset_index()
print(df2)

Output:

  Prop_Number  1_Num 2_Num 3_Num
0          87   .372  2345   843
1          33   .853  9834   234
2          67   .852  3421   439
3          13   .943  8427   493
4          92  1.957  1234   123
5          15  1.234   742   821

Details: rename df1 columns to match df2 columns and use set_index, with update to modify df2.

2
votes

Scott Provided an excellent answer, but I was interested in your matching of columns by numbers and thought this is could help with your problem.

the idea is to regex match all numeric type columns in your data frame and then sort them by the number, this allows us to match the columns from df1 to df2 :

also as you have differently named indices your index will be returned blank, you can update this manually.

def match_numeric_columns(dataframe1, dataframe2):

"""
the first argument will be the dataframe you want to rename
takes in two dataframes and returns their alphanumeric 
values as matches. e.g col1a = 1cola or Data_225 = 225_Info
"""


   cola = (
        dataframe1.filter(regex="\d").columns)

   colb = (
    dataframe2.filter(regex="\d").columns)

   all_matches = {
    (k if int(re.findall("\d+", k)[0]) == int(re.findall("\d+", v)[0]) else None): 
    (v if int(re.findall("\d+", v)[0]) == int(re.findall("\d+", k)[0]) else None
    )
    for (k, v) in zip(cola, colb)
   }


    matching_cols = {k: v for k, v in all_matches.items() if v is not None}

    return matching_cols

print(matching_cols(df1,df2))
{'1_Num': 'Num1', '2_Num': 'Num2', '3_Num': 'Num3'}

df2_v2 = (
    df2.set_index("ID")
    .rename(columns=match_numeric_columns(df2, df1))
    .replace("", np.nan)
    .combine_first(df1.set_index("ID"))
)

print(df2_v2)
     Num1    Num2   Num3
13  0.943  8427.0  493.0
15  1.234   742.0  821.0
33  0.853  9834.0  234.0
67  0.852  3421.0  439.0
87  0.372  2345.0  843.0
92  1.957  1234.0  123.0