0
votes

I want to mutate a variable to a list column based on a filtered dataframe that is itself nested inside the list column.

Reprex: I use the built-in diamonds package

library(tidyverse)

play <- 
 diamonds %>% 
 gather(letters, value, x:z) %>% 
 nest(letters, value, .key = "nest2") %>% 
 group_by(cut) %>% 
 nest(.key = "nest1")

I now have a 5x2 tibble with a cut column and nest1 which is the list column. Inside that are 6 normal variables and a further list column nest2.

I want to mutate a column in nest1 with a count of rows in nest2. I can do this with

play_2 <- 
  play %>%
  mutate(nest1 = map(nest1, ~ mutate(.x, n_row = map_int(nest2, nrow))))
play_2$nest1[3] #to check

What actually want is a count of rows in nest2 based on a filter e.g. nest2 != "y". I have tried numerous subset options but am failing miserably. I am sure this is to do with the fact nest2 is a list of tibbles but I can't figure out the correct way to approach it.

1

1 Answers

1
votes

Instead of using nrow, you can use a function sum(.x$letters != 'y') in map_int to count the rows using a condition:

play_2 <- 
    play %>%
    mutate(nest1 = map(nest1, 
        ~ mutate(.x, n_row = map_int(nest2, ~ sum(.x$letters != 'y')))
    ))

A few checks:

play_2$nest1[[1]]$n_row[[1000]]
# [1] 4

play_2$nest1[[1]]$nest2[[1000]]
# A tibble: 6 x 2
#  letters value
#  <chr>   <dbl>
#1 x        4.38
#2 x        4.34
#3 y        4.4 
#4 y        4.38
#5 z        2.73
#6 z        2.71

play_2$nest1[[2]]$n_row[[1000]]
#[1] 2

play_2$nest1[[2]]$nest2[[1000]]
# A tibble: 3 x 2
#  letters value
#  <chr>   <dbl>
#1 x        6.5 
#2 y        6.55
#3 z        3.89