0
votes

I'm doing a cross-check between 2 dataframes to assign a value to a flag. If a specific key is present in both dataframes with a different value, the flag will be set to "change" for that row. If the value is the same, the flag will be set to "no change". However if a specific key is present more than once in only one of the 2 dataframes, then the value of the flag will be "add". Let me give an example to make it clearer:

df 1:

key value key value present in df 2
abcd 1 False
wxyz 5 True

df 2:

key value key value present in df 1
abcd 2 False
wxyz 5 True

Then the result will be for dataframe 1:

df 1:

key value key value present in df 2 xcheck_flag
abcd 1 False change
wxyz 5 True no change

To get this result I use the following logic:

def changeType(df1):
    def condition_check(row):
            if (row['key value present in df 2'] == False):
                    return 'change'
            else:
                    return 'no change'
    df1['xcheck_flag']= df1.apply(condition_check, axis=1)

Now this is rather straightforward, right? Well I have a complication which I haven't been able to solve, yet.

Imagine the following use case:

df 1:

key value key value present in df 2
abcd 1 False
wxyz 5 True
abcd 3 False

df 2:

key value key value present in df 1
abcd 2 False
wxyz 5 True

In this case, the key abcd appears twice in df 1 and only once in df 2. If this happen, I need to apply the following logic when doing the cross-dataframe check: the first time I will match the key with dataframe 2, then set the value of the flag to change like in previous case; the second time we match the value, then set the flag to "additional change". It doesn't matter which row from df 1 gets assigned the value "change" or "additional". The only condition is that when you have such a case, only one key-value gets assigned with "change" and then all the others that might happen get assigned with "additional"

This give us:

df 1:

key value key value present in df 2 xcheck_flag
abcd 1 False change
wxyz 5 True no change
abcd 3 True additional change

I've been trying to adapt my initial function to include this behaviour but without success.

If you have any hint, it would be greatly welcomed!