How to create new column in data frame and insert data based on multiple where clauses in another dataframe?

Question

I have two dataframes like this:

df1

Win   ID
1     345
0     678
0     901
1     124


df2

Code     ID     Win
a        123     1
b        nan     nan
c        nan     nan

I want to create a new column in df1 called 'code' only where 'Win' in df1 equals '1' and insert the value from df2 column 'code' WHERE 'ID' and 'Win' in df2 are nan.

I also need to update df2 columns 'ID' and 'Win' with the 'ID' from df1 only for the 'Code' that has been assigned in df1. 'Win' should equal 1.

How would I do this?

df1 and df2 updated should look like this:

df1

Win   ID      Code
1     345     b
0     678     nan
0     901     nan
1     124     c

df2

Code     ID     Win
a        123     1
b        345     1
c        124     1

I tried to extract the index and apply values but I can not get past how to only select the value of the integer like so:

idx = df2.loc[(df2['Win'].isnull()) & (df2['ID'].isnull())].index.values.astype(int)[0]

Your example as the same amount of nan rows in df2 than Win=1 in df1, but do you have the same condition in your real data? otherwise how do you decide which ID get added in df2 and/or code in df1? also the fact that ID=123 is already assigned to code=a has an importance on the rest of your problem? — Ben.T
@Ben.T 1. There will always be a mismatch of the count of rows between the dataframes. The logic should only pick values from df2 where the condition is met. 2. I want to assign the ID that is the first one available that has met the condition from the top of the dataframe - if that makes sense. 3. yes, ID=123 does have a importance, which is to not assgn the code 'a' to anything. avoid this row when selecting the criteria. — RustyShackleford

Westfalenmats Westfalenmats · Accepted Answer · 2020-08-12T16:01:37

If there are always enough free codes in df2 (which needs to be checked if not) you could define a loop as follows

df1 = pd.DataFrame(columns=['Win', 'ID'], data=[[1, 345], [0, 678], [0, 901], [1, 124]])
df2 = pd.DataFrame(columns=['Code', 'ID', 'Win'], data=[['a', 123, 1], ['b'], ['c']])

for idx in df1.loc[df1['Win']==1].index:
    code_idx = df2[(df2['Win'].isnull()) & (df2['ID'].isnull())].index[0]
    df1.loc[idx, 'Code'] = df2.loc[code_idx, 'Code']
    df2.loc[code_idx, 'ID'] = df1.loc[idx, 'ID']
    df2.loc[code_idx, 'Win'] = 1

How to create new column in data frame and insert data based on multiple where clauses in another dataframe?

1 Answers