Here is an example data.table
dt <- data.table(col1 = c('A', 'A', 'B', 'C', 'C', 'D'), col2 = c(NA, 'dog', 'cat', 'jeep', 'porsch', NA))
col1 col2
1: A NA
2: A dog
3: B cat
4: C jeep
5: C porsch
6: D NA
I want to remove rows where col1 is duplicated if col2 is NA and has a non-NA value in another row. AKA group by col1, then if group has more than one row and one of them is NA, remove it. This would be the result for dt
:
col1 col2
2: A dog
3: B cat
4: C jeep
5: C porsch
6: D NA
I tried this:
dt[, list(col2 = ifelse(length(col1>1), col2[!is.na(col2)], col2)), by=col1]
col1 col2
1: A dog
2: B cat
3: C jeep
4: D NA
What am I missing? Thank you