I have one dataframe with some duplicated rows, which I want to join only duplicated rows. Given an example below:
name b c d
1 yp 3 NA NA
2 yp 3 1 NA
3 IG NA 3 NA
4 OG 4 1 0
the duplicated rows are defined by the rows which have the same name. Thus in this example, row 1 and row 2 need to be join somehow, with the NA values replaced by possible numerical value.
name b c d
1 yp 3 1 NA
2 IG NA 3 NA
3 OG 4 1 0
Assumption: if two rows have the same name, and their corresponding columns are not NA, then the corresponding column values must be the same numerical value.
library(data.table); setDT(data); data[ , lapply(.SD, unique), by = name]. Note that this will fail if any ofb,c, ordis not unique withinname. - MichaelChirico