purrr: error when turning a nested list to a character vector

Question

I have a dataset with some duplicate entries that I want to change to include only unique combinations of values, with a dup_num column to indicate the number of duplicate entries, and a dup_rows column to indicate which rows contain duplicate data.

I implemented a solution based on Finding duplicate observations of selected variables in a tibble , but it throws a mess of warnings when coercing data in the column containing the list of row numbers to a character vector. Not a problem now, but I want to show this data with DT and Shiny and the warnings are a problem for this application.

library(tidyverse)

df <- tibble(episode = 1:30,
             day = rep(c("Mon", "Wed", "Fri"), 10),
             name = rep(c(
               "Moe", "Larry", "Curly", "Shemp", "extra"
             ), 6))

chr_dups <- as_mapper( ~ str_c(.x) %>%
                         str_remove_all("[c\\(\\)]"))

df %>%
  nest(episode, .key = "dups") %>%
  mutate(dup_num = map_dbl(dups, nrow),
         dup_rows = map_chr(dups, chr_dups))
#> Warning in stri_c(..., sep = sep, collapse = collapse, ignore_null = TRUE):
#> argument is not an atomic vector; coercing

#> Warning in stri_c(..., sep = sep, collapse = collapse, ignore_null = TRUE):
#> argument is not an atomic vector; coercing

#> Warning in stri_c(..., sep = sep, collapse = collapse, ignore_null = TRUE):
#> argument is not an atomic vector; coercing

#> Warning in stri_c(..., sep = sep, collapse = collapse, ignore_null = TRUE):
#> argument is not an atomic vector; coercing

#> Warning in stri_c(..., sep = sep, collapse = collapse, ignore_null = TRUE):
#> argument is not an atomic vector; coercing

#> Warning in stri_c(..., sep = sep, collapse = collapse, ignore_null = TRUE):
#> argument is not an atomic vector; coercing

#> Warning in stri_c(..., sep = sep, collapse = collapse, ignore_null = TRUE):
#> argument is not an atomic vector; coercing

#> Warning in stri_c(..., sep = sep, collapse = collapse, ignore_null = TRUE):
#> argument is not an atomic vector; coercing

#> Warning in stri_c(..., sep = sep, collapse = collapse, ignore_null = TRUE):
#> argument is not an atomic vector; coercing

#> Warning in stri_c(..., sep = sep, collapse = collapse, ignore_null = TRUE):
#> argument is not an atomic vector; coercing

#> Warning in stri_c(..., sep = sep, collapse = collapse, ignore_null = TRUE):
#> argument is not an atomic vector; coercing

#> Warning in stri_c(..., sep = sep, collapse = collapse, ignore_null = TRUE):
#> argument is not an atomic vector; coercing

#> Warning in stri_c(..., sep = sep, collapse = collapse, ignore_null = TRUE):
#> argument is not an atomic vector; coercing

#> Warning in stri_c(..., sep = sep, collapse = collapse, ignore_null = TRUE):
#> argument is not an atomic vector; coercing

#> Warning in stri_c(..., sep = sep, collapse = collapse, ignore_null = TRUE):
#> argument is not an atomic vector; coercing
#> # A tibble: 15 x 5
#>    day   name  dups             dup_num dup_rows
#>    <chr> <chr> <list>             <dbl> <chr>   
#>  1 Mon   Moe   <tibble [2 x 1]>       2 1, 16   
#>  2 Wed   Larry <tibble [2 x 1]>       2 2, 17   
#>  3 Fri   Curly <tibble [2 x 1]>       2 3, 18   
#>  4 Mon   Shemp <tibble [2 x 1]>       2 4, 19   
#>  5 Wed   extra <tibble [2 x 1]>       2 5, 20   
#>  6 Fri   Moe   <tibble [2 x 1]>       2 6, 21   
#>  7 Mon   Larry <tibble [2 x 1]>       2 7, 22   
#>  8 Wed   Curly <tibble [2 x 1]>       2 8, 23   
#>  9 Fri   Shemp <tibble [2 x 1]>       2 9, 24   
#> 10 Mon   extra <tibble [2 x 1]>       2 10, 25  
#> 11 Wed   Moe   <tibble [2 x 1]>       2 11, 26  
#> 12 Fri   Larry <tibble [2 x 1]>       2 12, 27  
#> 13 Mon   Curly <tibble [2 x 1]>       2 13, 28  
#> 14 Wed   Shemp <tibble [2 x 1]>       2 14, 29  
#> 15 Fri   extra <tibble [2 x 1]>       2 15, 30

^{Created on 2019-09-19 by the reprex package (v0.3.0)}

I am pretty sure that the problem is in as_mapper().

Below is a reprex with representative toy data. The tibble describes some episodes from the Three Stooges, the day the episode ran, and the character who was the protagonist for the episode.

Thanks!

akrun akrun · Accepted Answer · 2019-09-19T20:47:49

It is a warning because the list elements are not atomic, i.e. it is a list of tibble which can be identified, if we pull the column

df %>%
  nest(dups = episode)  %>% 
  pull(dups)
#<list_of<tbl_df<episode:integer>>[15]>
#[[1]]
# A tibble: 2 x 1
#  episode
#    <int>
#1       1
#2      16

#[[2]]
# A tibble: 2 x 1
#  episode
3    <int>
#1       2
#2      17
# ...

So, it is a list of tibble. either we can extract the column with pull

or we can flatten it and apply the function

library(purrr)
df %>%
   nest(dups = episode) %>%
   mutate(dup_num = map_dbl(dups, nrow), 
         dup_rows = map(dups, ~ flatten_int(.x) %>% 
                                     chr_dups))

NOTE: It is not clear why the function 'chr_dups' is applied on the 'episode' column which is numeric. The transformations are also not making sense

If we just need to paste the elements of 'episode' grouped by the other columns, a base R single line approach is

aggregate(episode~ day + name, df, toString)
#   day  name episode
#1  Fri Curly   3, 18
#2  Mon Curly  13, 28
#3  Wed Curly   8, 23
#4  Fri extra  15, 30
#5  Mon extra  10, 25
#6  Wed extra   5, 20
#7  Fri Larry  12, 27
#8  Mon Larry   7, 22
#9  Wed Larry   2, 17
#10 Fri   Moe   6, 21
#11 Mon   Moe   1, 16
#12 Wed   Moe  11, 26
#13 Fri Shemp   9, 24
#14 Mon Shemp   4, 19
#15 Wed Shemp  14, 29

purrr: error when turning a nested list to a character vector

3 Answers