2
votes

I would like to construct a function that takes the column name as a string variable and then calculates some basic statistics on that column and return a dataframe.

The function works fine when column names are hardcoded, but I have been struggling with a methodology to do the same when supplying a variable.

I have used the example in https://dplyr.tidyverse.org/reference/summarise.html but this does not appear to work and generates

Error: Can't convert a NULL to a quosure

if I use: using variable column names in dplyr summarise

I get Error in sort.int(x, na.last = na.last, decreasing = decreasing, ...) : 'x' must be atomic

if I use R - dplyr 0.4.1 : How to summarise by a column name in a function

I get the error Error in !rlang::sym(column_to_summerise) : invalid argument type

I have a data frame of this structure and sample data.

Date_Received Discipline Specimen_Number Specimen_Details_Booked_By Date_Time_Booked_In Date_Booked_In
1    2019-09-04          C  B,19.4015575.P                       CMSE 2019-09-04 01:03:00     2019-09-04

Date_Time_Authorised Date_Time_Collected  Date_Time_Received Set_Code Location Location_Group_Code
1  2019-09-04 01:31:00 2019-09-04 00:30:00 2019-09-04 00:55:00     PRO2     QOPD                <NA>

Analytical_Category tat_auth_collected tat_auth_recieved travel_time reception_time hour_booked_in
1                   E          1.0166667         0.6000000   0.4166667     0.13333333              1

and I am building a generic function :

summariseData <- function(df, column_to_summerise, target, kpi_target)
{
  calc_df <- df %>% 
    group_by(Date_Received) %>% 
    dplyr::summarise_(med=median(.data[[column_to_summerise]], na.rm = TRUE), 
                     per95=quantile(.data[[column_to_summerise]], probs = kpi_target, na.rm = TRUE), 
                     In_Target = sum(.data[[column_to_summerise]] <= target, na.rm = TRUE), 
                     Out_Target = sum(.data[[column_to_summerise]] > target, na.rm = TRUE), 
                     Total_Data = count(df)) 
  return(calc_df)
}

or using:

summariseData <- function(df, column_to_summerise, target, kpi_target)
{
  calc_df <- df %>% 
    group_by(Date_Received) %>% 
    dplyr::summarise_(med=median(get(column_to_summerise), na.rm = TRUE), 
                     per95=quantile(get(column_to_summerise), probs = kpi_target, na.rm = TRUE), 
                     In_Target = sum(get(column_to_summerise) <= target, na.rm = TRUE), 
                     Out_Target = sum(get(column_to_summerise) > target, na.rm = TRUE), 
                     Total_Data = count(df)) 
  return(calc_df)
}

which I would run thus:

# In hours
turnaround_time_target <- 4

# in minutes
reception_target <- 15

# in minutes
travel_time_target <- 15

# In percent
kpi_trigger <- 98

#percentile for KPI
percentile_for_kpi <- kpi_trigger / 10

df_collection <- summariseData(original_df, "tat_auth_collected", turnaround_time_target, percentile_for_kpi)  
1
library(dplyr) ds <- data.frame(result=1:100, sex=c("male","female")) basic_summaries <- function(mydata, dv, iv) { #dataset <- enquo(dataset) dv <- enquo(dv) iv <- enquo(iv) mydata %>% group_by(!!iv) %>% summarise_at(vars(!!dv), list(mean, sd)) } basic_summaries(mydata = ds, iv = "sex", dv = "result") [[Now you can adapt for what you want.]]Luis

1 Answers

0
votes

using the outline in the comments by Luis. Translating into my function:

summariseData <- function(df, column_to_summerise, target, kpi_target)
{
  column_to_summerise <- enquo(column_to_summerise)

  calc_df <- df %>% 
    group_by(Date_Received) %>% 
    dplyr:: summarise(med=median(!!column_to_summerise, na.rm = TRUE), 
                      per95=quantile(!!column_to_summerise, probs = kpi_target, na.rm = TRUE), 
                      In_Target = sum(!!column_to_summerise <= target, na.rm = TRUE), 
                      Out_Target = sum(!!column_to_summerise > target, na.rm = TRUE), 
                      Total_Data = n()) 
  return(calc_df)
}

Key point is to use the enquo() function prior to using !!

From (https://dplyr.tidyverse.org/articles/programming.html) By analogy to strings, we don’t want "", instead we want some function that turns an argument into a string. That’s the job of enquo(). enquo() uses some dark magic to look at the argument, see what the user typed, and return that value as a quosure

In dplyr (and in tidyeval in general) you use !! to say that you want to unquote an input so that it’s evaluated