Hoping that someone can help me with a trick. I've found similar questions online, but none of the examples I've seen do exactly what I'm looking for or work on my data structure.
I need to remove NAs from a data frame along data subsets and compress the remaining NA values into rows for each data subset.
Example:
#create example data
a <- c(1, 1, 1, 2, 2, 2) #this is the subsetting variable in the example
b <- c(NA, NA, "B", NA, NA, "C") #max 1 non-NA value for each subset
c <- c("A", NA, NA, "A", NA, NA)
d <- c(NA, NA, 1, NA, NA, NA) #some subsets for some columns have all NA values
dat <- as.data.frame(cbind(a, b, c, d))
> desired output
a b c d
1 B A 1
2 C A <NA>
Rules of thumb: 1) Need to remove NA values from each column 2) Loop along data subsets (column "a" in example above) 3) All columns, for each subset, have a max of 1 non-NA value, but some columns may have all NA values
Ideas:
- lapply or dplyr is probably helpful to loop along all columns
- na.omit is likely helpful, if the subsetting column that has entries for all rows can be ignored (something like as.data.frame(lapply(dat.admin, na.omit))). issue in returning lapply output to data frame if some subsets don't return any non-NA values
- x[which.min(is.na(x))] effectively accomplishes this if laboriously applied to each individual column
Any help is appreciated to put the final pieces together! Thank you!