Flag duplicate obs between based on two ID variables

Question

I have data.table with id1 and id2 columns (as below)

data.table(id1=c(1,1,2,3,3,3,4), id2=c(1,2,2,1,2,3,2))

I would like to generate a flag to identify the duplicate association between id1 and id2 (if a particular id2 is already associated with id1 then it should be flagged..see explanation below...)

I can think of logic in multiple steps but wondering if there is an easy way to accomplish this. I prefer using data.table package only.

The example data you created data.table(id1=c(1,1,2,3,3,3,4), id2=c(1,2,2,1,2,3,2)) and the one showed is different — akrun

akrun akrun · Accepted Answer · 2021-09-06T20:19:30

We can use duplicated on the 'id1' to return a logical vector, which is changed to numeric index to replace values from a vector (c("", "Y"))

library(data.table)
dt1[, flag := c("", "Y")[1 + duplicated(id1)]]

If we can have NA instead of "", another option is to specify the condition in i and do the assignment

dt1[duplicated(id1), flag := "Y"]