I have a dataset with some duplicate entries that I want to change to include only unique combinations of values, with a dup_num
column to indicate the number of duplicate entries, and a dup_rows
column to indicate which rows contain duplicate data.
I implemented a solution based on Finding duplicate observations of selected variables in a tibble , but it throws a mess of warnings when coercing data in the column containing the list of row numbers to a character vector. Not a problem now, but I want to show this data with DT and Shiny and the warnings are a problem for this application.
library(tidyverse)
df <- tibble(episode = 1:30,
day = rep(c("Mon", "Wed", "Fri"), 10),
name = rep(c(
"Moe", "Larry", "Curly", "Shemp", "extra"
), 6))
chr_dups <- as_mapper( ~ str_c(.x) %>%
str_remove_all("[c\\(\\)]"))
df %>%
nest(episode, .key = "dups") %>%
mutate(dup_num = map_dbl(dups, nrow),
dup_rows = map_chr(dups, chr_dups))
#> Warning in stri_c(..., sep = sep, collapse = collapse, ignore_null = TRUE):
#> argument is not an atomic vector; coercing
#> Warning in stri_c(..., sep = sep, collapse = collapse, ignore_null = TRUE):
#> argument is not an atomic vector; coercing
#> Warning in stri_c(..., sep = sep, collapse = collapse, ignore_null = TRUE):
#> argument is not an atomic vector; coercing
#> Warning in stri_c(..., sep = sep, collapse = collapse, ignore_null = TRUE):
#> argument is not an atomic vector; coercing
#> Warning in stri_c(..., sep = sep, collapse = collapse, ignore_null = TRUE):
#> argument is not an atomic vector; coercing
#> Warning in stri_c(..., sep = sep, collapse = collapse, ignore_null = TRUE):
#> argument is not an atomic vector; coercing
#> Warning in stri_c(..., sep = sep, collapse = collapse, ignore_null = TRUE):
#> argument is not an atomic vector; coercing
#> Warning in stri_c(..., sep = sep, collapse = collapse, ignore_null = TRUE):
#> argument is not an atomic vector; coercing
#> Warning in stri_c(..., sep = sep, collapse = collapse, ignore_null = TRUE):
#> argument is not an atomic vector; coercing
#> Warning in stri_c(..., sep = sep, collapse = collapse, ignore_null = TRUE):
#> argument is not an atomic vector; coercing
#> Warning in stri_c(..., sep = sep, collapse = collapse, ignore_null = TRUE):
#> argument is not an atomic vector; coercing
#> Warning in stri_c(..., sep = sep, collapse = collapse, ignore_null = TRUE):
#> argument is not an atomic vector; coercing
#> Warning in stri_c(..., sep = sep, collapse = collapse, ignore_null = TRUE):
#> argument is not an atomic vector; coercing
#> Warning in stri_c(..., sep = sep, collapse = collapse, ignore_null = TRUE):
#> argument is not an atomic vector; coercing
#> Warning in stri_c(..., sep = sep, collapse = collapse, ignore_null = TRUE):
#> argument is not an atomic vector; coercing
#> # A tibble: 15 x 5
#> day name dups dup_num dup_rows
#> <chr> <chr> <list> <dbl> <chr>
#> 1 Mon Moe <tibble [2 x 1]> 2 1, 16
#> 2 Wed Larry <tibble [2 x 1]> 2 2, 17
#> 3 Fri Curly <tibble [2 x 1]> 2 3, 18
#> 4 Mon Shemp <tibble [2 x 1]> 2 4, 19
#> 5 Wed extra <tibble [2 x 1]> 2 5, 20
#> 6 Fri Moe <tibble [2 x 1]> 2 6, 21
#> 7 Mon Larry <tibble [2 x 1]> 2 7, 22
#> 8 Wed Curly <tibble [2 x 1]> 2 8, 23
#> 9 Fri Shemp <tibble [2 x 1]> 2 9, 24
#> 10 Mon extra <tibble [2 x 1]> 2 10, 25
#> 11 Wed Moe <tibble [2 x 1]> 2 11, 26
#> 12 Fri Larry <tibble [2 x 1]> 2 12, 27
#> 13 Mon Curly <tibble [2 x 1]> 2 13, 28
#> 14 Wed Shemp <tibble [2 x 1]> 2 14, 29
#> 15 Fri extra <tibble [2 x 1]> 2 15, 30
Created on 2019-09-19 by the reprex package (v0.3.0)
I am pretty sure that the problem is in as_mapper()
.
Below is a reprex with representative toy data. The tibble describes some episodes from the Three Stooges, the day the episode ran, and the character who was the protagonist for the episode.
Thanks!