Problem
I'm trying to use dplyr::mutate()
and dplyr::case_when()
to create a new column of data in a dataframe that is populated using data stored in another object (a "lookup list") and is based on information in columns in the dataframe.
I know the answer probably has to do with (im)proper use of quasiquotation and NSE, but I'm having trouble extrapolating the information in the Programming with dplyr
vignette to my situation.
I'm hoping that posting this reprex here can guide me to the right answer, and I think that solving this problem will go a long ways towards helping me grok NSE.
Sample data
key_list <- list(
"a" = list(
foo = 1,
bar = 2),
"b" = list(
foo = 3,
bar = 4),
"c" = list(
foo = 5,
bar = 6)
)
x <- tibble(fruit = c("apple", "orange", "grape", "apple", "apple", "orange"),
`Old Letter` = c("a", "a", "b", "c", "c", "c"),
`Old Number` = c(9, 8, 7, 6, 5, 4)
)
x
# # A tibble: 6 x 3
# fruit `Old Letter` `Old Number`
# <chr> <chr> <dbl>
# 1 apple a 9
# 2 orange a 8
# 3 grape b 7
# 4 apple c 6
# 5 apple c 5
# 6 orange c 4
Goal
Specifically, I want to create a new column in x
(which I will call `New Number`
) that is populated based on the values in x$fruit
and x$`Old Letter`
.
Here's the code that mimicks where I get hung up in my actual use case:
x %>% mutate(`New Number` = case_when(
fruit == "apple" ~ pluck(key_list, `Old Letter`, "foo") * 10,
fruit == "orange" ~ pluck(key_list, `Old Letter`, "foo") * 100,
fruit == "grape" ~ pluck(key_list, `Old Letter`, "foo") * 1000
))
# Error: Index 1 must have length 1, not 6
Expected output
In my mind, I see the (desired) order of operations like this, for e.g. the first row of x
:
fruit == "apple"
is TRUE, so evaluate this expression:pluck(key_list, `Old Letter`, "foo") * 10
- Since the value in the
`Old Letter`
column for this row is"a"
, the expression becomespluck(key_list, "a", "foo") * 10
(which should operate on thekey_list
object in the global environment) - This simplifies to
2 * 10
which equals20
- Place the result of evaluating this expression into the
`New Number`
column.
Extrapolating this to the entire command, I expected this as the output:
# # A tibble: 6 x 4
# fruit `Old Letter` `Old Number` `New Number`
# <chr> <chr> <dbl> <dbl>
# 1 apple a 9 20
# 2 orange a 8 200
# 3 grape b 7 4000
# 4 apple c 6 60
# 5 apple c 5 60
# 6 orange c 4 600
My take:
Judging from the error message I'm getting, it looks like rather than a single value from the `Old Letter`
column being used as an index for pluck()
to use, the entire `Old Letter`
column is being passed as a vector. I'm guessing that this is because according to the documentation for case_when()
:
case_when()
is not a tidy eval function.
I've tried to track down how this is happening, but the trace stack didn't seem to point me anywhere helpful, and wrapping the whole command in either rlang::qq_show()
or quo()
didn't show me how R was interpreting the command with respect to NSE because they both threw that same error as well.
I've experimented with combinations of:
quo()
,enquo()
,!!
,!!enquo()
(shortened as{{}}
), andsym()
in the Reprex
code above, as well as wrapping it up into a function, but it throws the same error:
get_num <- function(x, y) purrr::pluck(key_list, x, y)
x %>% mutate(`New Number` = case_when(
fruit == "apple" ~ get_num(`Old Letter`, "foo") * 10,
fruit == "orange" ~ get_num(`Old Letter`, "foo") * 100,
fruit == "grape" ~ get_num(`Old Letter`, "foo") * 1000
))
# Error: Index 1 must have length 1, not 6
This answer to another question on SO tells me that
I guess what you are missing about
case_when()
is that the arguments are evaluated at once, not per row.
But I'm not sure if/how that applies to my situation, so I'm at a loss.
Anyways, thanks for any help you folks can provide!
sessionInfo()
:
R version 3.6.0 (2019-04-26)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS 10.15
Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] rlang_0.4.1 readxl_1.3.1 forcats_0.4.0 stringr_1.4.0 dplyr_0.8.3 purrr_0.3.3 readr_1.3.1 tidyr_1.0.0 tibble_2.1.3
[10] ggplot2_3.2.1 tidyverse_1.2.1
loaded via a namespace (and not attached):
[1] Rcpp_1.0.2 cellranger_1.1.0 pillar_1.4.2 compiler_3.6.0 base64enc_0.1-3 tools_3.6.0 digest_0.6.22 zeallot_0.1.0 evaluate_0.14
[10] lubridate_1.7.4 jsonlite_1.6 lifecycle_0.1.0 nlme_3.1-141 gtable_0.3.0 lattice_0.20-38 pkgconfig_2.0.3 cli_1.1.0 rstudioapi_0.10
[19] yaml_2.2.0 haven_2.1.1 xfun_0.10 withr_2.1.2 xml2_1.2.2 httr_1.4.1 knitr_1.25 generics_0.0.2 vctrs_0.2.0
[28] hms_0.5.1 grid_3.6.0 tidyselect_0.2.5 glue_1.3.1 R6_2.4.0 fansi_0.4.0 rmarkdown_1.16 modelr_0.1.5 magrittr_1.5
[37] htmltools_0.4.0 backports_1.1.5 scales_1.0.0 rvest_0.3.4 assertthat_0.2.1 colorspace_1.4-1 utf8_1.1.4 stringi_1.4.3 lazyeval_0.2.2
[46] munsell_0.5.0 broom_0.5.2 crayon_1.3.4