0
votes

I have two dataframes and df2 is more columns

If the row in df1 doesn't have in df2, I select it to df3

df1

    id  colA colB
0   1   4    1
1   2   5    2
2   3   2    4
3   4   4    2
4   5   2    4

df2

    id  colA colB colC
0   1   4    1    0
1   2   5    2    0
2   5   2    4    0

I want select some rows from df1

df3

    id  colA colB
0   3   2    4
1   4   4    2
3
Ok, so where is your code?roganjosh
df3=df1.loc[~df1.id.isin(df2.id),].copy() BENY
Are you only comparing on 'id' column?Scott Boston

3 Answers

1
votes

Assuming you are comparing on the 'id' column (if not, please clarify), you can use Series.isin with boolean indexing.

>>> df3 = df1[~df1['id'].isin(df2['id'])]
>>> df3
   id  colA  colB
2   3     2     4
3   4     4     2
0
votes
df3 = df1.loc[~df1['id'].isin(list(df2['id']))]

Output:

   id  colA  colB
2   3     2     4
3   4     4     2
0
votes

Use drop_duplicates:

import pandas as pd

df1 = pd.DataFrame({'id': [1,2,3,4,5],
                    'colA':[4,5,2,4,2],
                    'colB':[1,2,4,2,4]})

df2 = pd.DataFrame({'id': [1,2,5],
                    'colA':[4,5,2],
                    'colB':[1,2,4])

pd.concat([df1,df2]).drop_duplicates(subset='id',keep=False)

Output:

   id    colA   colB
2   3    2     4
3   4    4     2