I am looking for an efficient way to remove rows of a tibble where the non-missing values are identical to missing values in another row. Consider this fake example:
library(tidyverse)
phony_genes <- tribble(
~mouse_entrez, ~mgi_symbol, ~human_entrez, ~hgnc_symbol,
1, "a", 2 , "A",
1, "a", 2 , NA,
1, NA, 2 , "A",
1, "a", 3 , NA,
4, "b", 3 , NA,
5, NA, 2 , "A"
)
Row 2 is a subset of row 1, because each non-missing value is in row 2 is the same as in row 1. Same goes for row 3, but a different value is missing. I am looking for a way that uses the tidyverse (or other packages) to filter out rows 2 and 3, but keep the other rows. I can't filter out the NA
values in hgnc_symbol
or mgi_symbol
because in both cases I will lose rows that I want to keep. I can't group by mouse_entrez
and filter away the NA
values within the groups because I want to keep row 4. This simple example could of course be expanded to a huge tibble. I could probably do this by coding something myself but I am wondering if anyone has an elegant solution.
mouse_entrez
? So row 1 is matched with row 2, row 2 with 3 or row 1 is matched with 2, 3 and 4 ? – Ronak Shaha
andA
are different. Do you want to ignore the case? – Ronak Shah