0
votes

I'm seeing a mutate error caused by unrelated code that can either cause errors to be thrown or not based on unrelated code being run. For example,

  • I initialize data and run a block of mutate code successfully
  • I run a group_by, summarise code block that runs successfully but has warnings
  • The same original code with original data now throws a mutate error whereas it ran successfully before!
  • I run an unrelated mutate that doesn't affect the original data which runs successfully
  • I now run the mutate code a 3rd time, and the code runs successfully again now!
library(tidyverse)
library(scales)
#> 
#> Attaching package: 'scales'
#> The following object is masked from 'package:purrr':
#> 
#>     discard
#> The following object is masked from 'package:readr':
#> 
#>     col_factor

df_test <- tibble(group = c('a', 'a', 'b', 'b', 'b'), hour=parse_factor(as.character(c(1, 2, 1, 2, 1))), x = c(1,2,3, 4, 5), y=c(5, 6, 7, 8, 9))

return_data <- df_test %>%
  dplyr::mutate(
    hour = paste(hour, ':00'),
    across(.cols = c(x, y), scales::label_dollar())
  )

summarise_df_input <- function(.data, func, group_vars) {
  df_agg <- .data %>%
    group_by(across(all_of(group_vars))) %>%
    summarise(across(everything(), func))
  
  return(df_agg)
}

df_grouped <- df_test %>% summarise_df_input(mean, 'group')
#> Warning in mean.default(hour): argument is not numeric or logical: returning NA

#> Warning in mean.default(hour): argument is not numeric or logical: returning NA

return_data <- df_test %>%
  dplyr::mutate(
    hour = paste(hour, ':00'),
    across(.cols = c(x, y), scales::label_dollar())
  )
#> Error: Problem with `mutate()` input `..2`.
#> x subscript out of bounds
#> ℹ Input `..2` is `(function (.cols = everything(), .fns = NULL, ..., .names = NULL) ...`.

return_data <- df_test %>%
  dplyr::mutate(
    across(.cols = c(x, y), scales::label_dollar())
  )

return_data <- df_test %>%
  dplyr::mutate(
    hour = paste(hour, ':00'),
    across(.cols = c(x, y), scales::label_dollar())
  )

Created on 2021-03-05 by the reprex package (v1.0.0)

Here is a reprex of what's going on. Does anyone have any idea what's happening? This is using dplyr 1.0.5 btw.

1

1 Answers

0
votes

it is using everything, instead it should be where(is.numeric) because other than the group column, there is an 'hour' column which is factor and mean works on numeric variables

summarise_df_input <- function(.data, func, group_vars) {
  .data %>%
      group_by(across(all_of(group_vars))) %>%
       summarise(across(where(is.numeric), func), .groups = 'drop')  
  
}

-testing

df_test %>% 
    summarise_df_input(mean, 'group')
# A tibble: 2 x 3
#  group     x     y
#* <chr> <dbl> <dbl>
#1 a       1.5   5.5
#2 b       4     8 

Regarding the error in execution, it may be a bug. Changing the order of execution of across can bypass the error

return_data <- df_test %>%
    dplyr::mutate(
       across(.cols = c(x, y), scales::label_dollar()), hour = paste(hour, ':00')
  )