1
votes

Trying to extend my own workflow (from columns) here: [1] tidyverse - delete a column within a nested column/list to filtering within a nested column/list, I found this potential solution: [2] Use filter() (and other dplyr functions) inside nested data frames with map()

My problem is that I want to filter in each "nest" on those rows that are not completely NA (i.e. I want to keep any row that has at least one non-missing value.

However, the simple solution in [2] doesn't work for me, probably because I want to filter on the sum of NA's per row, which might involve another map function within the filter.

(Note: I'm using the current github version of dplyr within tidyverse which offers some new experimental functions, like condense - which I'm using below, but I think that's not relevant for my problem/question).

I have the following data:

Data:

library(tidyverse)
library(corrr)

dat <- data.frame(grp = rep(1:4, each = 25),
                  Q1 = sample(c(1:5, NA), 100, replace = TRUE),
                  Q2 = sample(c(1:5, NA), 100, replace = TRUE),
                  Q3 = sample(c(1:5, NA), 100, replace = TRUE),
                  Q4 = sample(c(1:5, NA), 100, replace = TRUE),
                  Q5 = sample(c(NA), 100, replace = TRUE),
                  Q6 = sample(c(1:5, NA), 100, replace = TRUE))

I now calculate the correlations of Q1 to Q6 per group and delete the rowname column.

cor_dat <- dat %>%
  group_by(grp) %>%
  condense(cor = correlate(cur_data()) %>%
         select(-rowname)) %>%
  ungroup()

But adding this line to my pipeline doesn't work:

cor_dat <- cor_dat %>%
  mutate(cor = map(cor, ~ filter(., sum(is.na(.)) != ncol(.))))

I also tried, but this doesn't work either:

cor_dat <- cor_dat %>%
      mutate(cor = map(cor, ~ filter(., !all(is.na(.)))))

Expected outcome in my data would be that the fifth row in each nest is filtered out.

1
indeed, edited my post accordinglydeschen
Sure. just forgot this to copy over from the other thread. edited my post accordingly.deschen

1 Answers

2
votes

Here, we can use filter_all

library(dplyr)
library(purrr)
cor_dat <- cor_dat %>% 
               mutate(cor = map(cor, ~ .x %>% 
                              filter_all(any_vars(!is.na(.)))))
cor_dat
# A tibble: 4 x 2
#    grp cor             
#  <int> <list>          
#1     1 <tibble [5 × 6]>
#2     2 <tibble [5 × 6]>
#3     3 <tibble [5 × 6]>
#4     4 <tibble [5 × 6]>

cor_dat$cor[[1]]
# A tibble: 5 x 6
#       Q1      Q2      Q3      Q4    Q5      Q6
#    <dbl>   <dbl>   <dbl>   <dbl> <dbl>   <dbl>
#1 NA      -0.226   0.288  -0.0536    NA  0.581 
#2 -0.226  NA      -0.382   0.212     NA  0.0274
#3  0.288  -0.382  NA      -0.0772    NA -0.153 
#4 -0.0536  0.212  -0.0772 NA         NA  0.0831
#5  0.581   0.0274 -0.153   0.0831    NA NA     

Or if we need to use filter then create the logic with rowSums

cor_dat %>%
       mutate(cor = map(cor, ~ .x %>%
                                  filter(rowSums(is.na(.)) < ncol(.))))

data

cor_dat <- dat %>%
   group_by(grp) %>%
   condense(cor = correlate(cur_data()) %>% 
                 select(-rowname)) %>% 
   ungroup