I have a dataframe as described below and I need to find out the duplicate groups based on the columns - value1,value2 & value3 (groups should be grouped by id). I need to fill column 'duplicated' with true if the group appears elsewhere in the table,if group is unique fill with false.
note: each group has different id.
df = pd.DataFrame({'id': ['A', 'A', 'A', 'A', 'B', 'B', 'C', 'C', 'C', 'C', 'D', 'D', 'D'],
'value1': ['1', '2', '3', '4', '1', '2', '1', '2', '3', '4', '1', '2', '3'],
'value2': ['1', '2', '3', '4', '1', '2', '1', '2', '3', '4', '1', '2', '3'],
'value3': ['1', '2', '3', '4', '1', '2', '1', '2', '3', '4', '1', '2', '3'],
'duplicated' : []
})
expected result is:
I tried this, but if is comparing rows, I need to compare groups (grouped by id)
import pandas as pd
data = pd.read_excel('C:/Users/path/Desktop/example.xlsx')
# False : Mark all duplicates as True.
data['duplicates'] = data.duplicated(subset= ["value1","value2","value3"], keep=False)
data.to_excel('C:/Users/path/Desktop/example_result.xlsx',index=False)
and I got:
note: the order of the records in the both groups doesnt matter


