I have three data-frames defined df1,df2 and df3. I need a simple way to find the unmatched records between df1, df2 and df3. Its opposite of the intersection of these data frames. If I compare it with SQL world it can be achieved by left outer joins between these three tables and filter on null columns to get unmatched records. How this can be easily done in Python?
0
votes
2 Answers
0
votes
If you want to join by a specific columns of the Dataframes you can do something like that:
indexes = df1[(df1[TARGET_COLUMN]!=df2[TARGET_COLUMN]) | (df1[TARGET_COLUMN]!=df3[TARGET_COLUMN])].index
Where the TARGET_COLUMN should be the name of the column.
If you want to compare between entire rows you can use:
def compare_indexes(x):
index = x.name
for column_name in df1.columns.values:
if not (x[column_name] == df2.iloc[index,:][column_name] == df3.iloc[index,:][column_name]):
return False
return True
df1["match"] = df1.apply(compare_indexes, axis=1)
indexes = df1[df1["match"]==False].index