I have a dataset consisting of id-name pairs. These pairs can appear multiple times. In some rows the value in the name column is missing, in others the id is missing:
id_name = data.table(stringsAsFactors=FALSE,
id = c("x123", "xy234","x123",NA,"z123","y123","d654","d654"),
name = c("john", "mary", NA,"mary","bob","bob", "john",NA)
)
I would like to replace only the NA values in one column with the (non-missing) value from another row, using the other column as lookup value. The result should be:
id name
1: x123 john
2: xy234 mary
3: x123 john
4: xy234 mary
5: z123 bob
6: y123 bob
7: d654 john
8: d654 john
What is the idiomatic way of achieving this, the data.table way? The following approach for updating only the name column does not produce an error, but it does not update the dataset either :
id_name[is.na(name)][id_name[!is.na(name)], on="id", name:=i.name]
> id_name
id name
1: x123 john
2: xy234 mary
3: x123 <NA>
4: <NA> mary
5: z123 bob
6: y123 bob
7: d654 john
8: d654 <NA>
However, just subset-and-join without update works:
id_name[is.na(name)][id_name[!is.na(name)], on="id",nomatch=0L]
id name i.name
1: x123 <NA> john
2: d654 <NA> john
UPDATE
Updated MRE to make clear that some names can be associated with 2 ids. Both ids have to be kept.