0
votes

I have data.table with id1 and id2 columns (as below)

data.table(id1=c(1,1,2,3,3,3,4), id2=c(1,2,2,1,2,3,2))

id1 id2
1 1
1 2
2 2
3 1
3 2
3 3
4 2

I would like to generate a flag to identify the duplicate association between id1 and id2 (if a particular id2 is already associated with id1 then it should be flagged..see explanation below...)

I can think of logic in multiple steps but wondering if there is an easy way to accomplish this. I prefer using data.table package only.

id1 id2 flag
1 1
1 2 Y
2 2
3 1 Y
3 2 Y
3 3
4 2
1
The example data you created data.table(id1=c(1,1,2,3,3,3,4), id2=c(1,2,2,1,2,3,2)) and the one showed is different - akrun

1 Answers

0
votes

We can use duplicated on the 'id1' to return a logical vector, which is changed to numeric index to replace values from a vector (c("", "Y"))

library(data.table)
dt1[, flag := c("", "Y")[1 + duplicated(id1)]]

If we can have NA instead of "", another option is to specify the condition in i and do the assignment

dt1[duplicated(id1), flag := "Y"]