Python - How to find unmatched records from three datasets

Question

I have three data-frames defined df1,df2 and df3. I need a simple way to find the unmatched records between df1, df2 and df3. Its opposite of the intersection of these data frames. If I compare it with SQL world it can be achieved by left outer joins between these three tables and filter on null columns to get unmatched records. How this can be easily done in Python?

Make a set and calculate the difference should do the trick. — Dschoni

AndreyF AndreyF · Accepted Answer · 2017-02-14T12:08:09

If you want to join by a specific columns of the Dataframes you can do something like that:

indexes = df1[(df1[TARGET_COLUMN]!=df2[TARGET_COLUMN]) | (df1[TARGET_COLUMN]!=df3[TARGET_COLUMN])].index

Where the TARGET_COLUMN should be the name of the column.

If you want to compare between entire rows you can use:

def compare_indexes(x):
    index = x.name
    for column_name in df1.columns.values:
        if not (x[column_name] == df2.iloc[index,:][column_name] == df3.iloc[index,:][column_name]):
            return False
    return True


df1["match"] = df1.apply(compare_indexes, axis=1)
indexes = df1[df1["match"]==False].index

Python - How to find unmatched records from three datasets

2 Answers