Subset-join-replace using R data.table

Question

I have a dataset consisting of id-name pairs. These pairs can appear multiple times. In some rows the value in the name column is missing, in others the id is missing:

id_name = data.table(stringsAsFactors=FALSE,
                 id = c("x123", "xy234","x123",NA,"z123","y123","d654","d654"),
                 name = c("john", "mary", NA,"mary","bob","bob", "john",NA)
                     )

I would like to replace only the NA values in one column with the (non-missing) value from another row, using the other column as lookup value. The result should be:

            id name
      1:  x123 john
      2: xy234 mary
      3:  x123 john
      4: xy234 mary
      5:  z123  bob
      6:  y123  bob
      7:  d654 john
      8:  d654 john

What is the idiomatic way of achieving this, the data.table way? The following approach for updating only the name column does not produce an error, but it does not update the dataset either :

id_name[is.na(name)][id_name[!is.na(name)], on="id", name:=i.name]
> id_name
      id name
1:  x123 john
2: xy234 mary
3:  x123 <NA>
4:  <NA> mary
5:  z123  bob
6:  y123  bob
7:  d654 john
8:  d654 <NA>

However, just subset-and-join without update works:

id_name[is.na(name)][id_name[!is.na(name)], on="id",nomatch=0L]

     id name i.name
1: x123 <NA>   john
2: d654 <NA>   john

UPDATE

Updated MRE to make clear that some names can be associated with 2 ids. Both ids have to be kept.

Unknown Unknown · Accepted Answer · 2020-11-30T00:53:37

I think you are looking for:

id_name[is.na(name), name :=
    id_name[!is.na(name)][.SD, on=.(id), x.name]
]
id_name[is.na(id), id := 
    id_name[!is.na(id)][.SD, on=.(name), mult="first", x.id]
]
id_name

Subset-join-replace using R data.table

3 Answers

Update