Summarize data frame to return non-NA values along subsets

Question

Hoping that someone can help me with a trick. I've found similar questions online, but none of the examples I've seen do exactly what I'm looking for or work on my data structure.

I need to remove NAs from a data frame along data subsets and compress the remaining NA values into rows for each data subset.

Example:

#create example data
a <- c(1, 1, 1, 2, 2, 2) #this is the subsetting variable in the example
b <- c(NA, NA, "B", NA, NA, "C") #max 1 non-NA value for each subset
c <- c("A", NA, NA, "A", NA, NA)
d <- c(NA, NA, 1, NA, NA, NA) #some subsets for some columns have all NA values

dat <- as.data.frame(cbind(a, b, c, d)) 

> desired output
  a b c    d
  1 B A    1
  2 C A <NA>

Rules of thumb: 1) Need to remove NA values from each column 2) Loop along data subsets (column "a" in example above) 3) All columns, for each subset, have a max of 1 non-NA value, but some columns may have all NA values

Ideas:

lapply or dplyr is probably helpful to loop along all columns
na.omit is likely helpful, if the subsetting column that has entries for all rows can be ignored (something like as.data.frame(lapply(dat.admin, na.omit))). issue in returning lapply output to data frame if some subsets don't return any non-NA values
x[which.min(is.na(x))] effectively accomplishes this if laboriously applied to each individual column

Any help is appreciated to put the final pieces together! Thank you!

MKR MKR · Accepted Answer · 2018-03-29T21:22:15

One solution could be achieved using dplyr::summarise_all. The data needs to be group_by on a.

library(dplyr)

dat %>%
  group_by(a) %>%
  summarise_all(funs(.[which.min(is.na(.))]))
# # A tibble: 2 x 4
#    a      b      c      d     
#   <fctr> <fctr> <fctr> <fctr>
# 1   1      B      A      1     
# 2   2      C      A      <NA>

Summarize data frame to return non-NA values along subsets

3 Answers