0
votes

I am trying to write a function that would automate some calculation that I want to do on different variable of my dataframe.

Here is my data frame looks like:

head(SoilGeology, n=5)  
# A tibble: 5 x 12
  Year  Zone            SubZone         Au_ppm Ag_ppm Cu_ppm Pb_ppm Zn_ppm As_ppm Sb_ppm Bi_ppm Mo_ppm
  <chr> <chr>           <chr>            <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>
1 1990  BugLake         BugLake          0.007    3.7     17     27     23      1      1     NA      1
2 1983  Johnny Mountain Johnny Mountain  0.01     1.6     71     63    550      4     NA     NA     NA
3 1983  Khyber Pass     Khyber Pass      0.12    11.5    275    204   8230    178      7     60     NA
4 1987  Chebry          Ridge Line Grid  0.05     2.2     35     21    105     16      6     NA     NA
5 1987  Chebry          Handel Grid      0.004    1.3     29     27    663     45      2     NA     NA

The function I wrote looks like this:

library(dplyr)
my_function <- function(df, st, elt){  

# df = data frame, str = element in string form, elt = element

  # tests
  if(!is.data.frame(df)){
    print("The table is not a data frame.")
    return(NULL)}  

  if(!is.character(st)){
     print('st is not in string form.')
     return(NULL)}

  if(!(st %in% colnames(df))){ 
    print("The element is not in the data frame.")
    return(NULL)}

  x <- list() # create our output list

  # Summary statistics
  x$stat <- df %>%
    filter(!is.na(elt)) %>%
    group_by(Year, Zone, SubZone) %>%
    summarise(
      n = sum(!is.na(elt)),
      min = min(elt),
      max = max(elt),
      mean = mean(elt),
      sd = sd(elt))

  # Boxplot
  x$boxplot <- df %>%
    group_by(Year, Zone, SubZone) %>%
    filter(n() > 40 & !is.na(elt)) %>%
    ggplot(df, mapping = aes(Zone, elt, color = Year)) +
    geom_boxplot() +
    scale_y_log10() +
    coord_flip()

  return(x)
}

I get the following error when I write

Ag <- summary_statistics(SoilGeology,'Ag_ppm', Ag_ppm)
Error in filter_impl(.data, quo) : 
  Evaluation error: object 'Ag_ppm' not found.

Outside of the function, my code works fine.

Any insights on why my function is not working?

1

1 Answers

0
votes

The problem might be because of non-standard evaluation (NSE) in dplyr.

You can look at this link, very very instructive : Programming with dplyr.

The short answer for your situation (that should work) :

  • in your function, transform the input as a "quosure" : insert at the beginning of your function : elt <- enquo(elt)
  • in x$stats and x$boxplot, "tidy-evaluate" the input, by replacing elt by !! elt

You can also look at this link, which has useful insights.