2
votes

I'm very new to anything SQL-related, but I feel like what I'm trying to do must be possible. I've looked at related questions, but they don't seem to be trying to do quite the same thing as I am.

I want to write a function to use in a dbplyr pipe, so I don't have to repeat the code if I need it again later.

I realise not all functions in R translate nicely to SQL, but in this case, case_when() does. It works fine when I use the bare code, but I don't know how to wrap it in a function. The function I write gets passed directly into SQL, and clearly it's not a function that exists in SQL.

Is it possible to get dbplyr to look at the code inside my function, and translate that into SQL?

library(pacman)
p_load(dplyr, dbplyr, RSQLite)

# Set up fake database
con <- DBI::dbConnect(RSQLite::SQLite(), dbname = ":memory:")
test = tibble(a = c("a", "b", "c"))
test
#> # A tibble: 3 x 1
#>   a    
#>   <chr>
#> 1 a    
#> 2 b    
#> 3 c
copy_to(con, test)
test_db <- tbl(con, "test")

Using case_when() explicitly on test_db works fine.

test_db %>% 
    mutate(b = case_when(
        a == "a" ~ "AAA",
        TRUE ~ a
    ))
#> # Source:   lazy query [?? x 2]
#> # Database: sqlite 3.30.1 [:memory:]
#>   a     b    
#>   <chr> <chr>
#> 1 a     AAA  
#> 2 b     b    
#> 3 c     c

Then I put my case_when() into a function, so I can reuse it. That works fine when running straight on test.

func_aaa <- function(x) {
    case_when(
        x == "a" ~ "AAA",
        TRUE ~ x
    )
}

# The function works fine on the regular dataframe
test %>% 
    mutate(b = func_aaa(a))
#> # A tibble: 3 x 2
#>   a     b    
#>   <chr> <chr>
#> 1 a     AAA  
#> 2 b     b    
#> 3 c     c

But it doesn't work on test_db. Looking at the query it generates, it's clear that dbplyr just passed my function name straight into SQL.

test_db %>% 
    mutate(b = func_aaa(a))
#> Error: no such function: func_aaa

test_db %>% 
    mutate(b = func_aaa(a)) %>% 
    show_query()
#> <SQL>
#> SELECT `a`, func_aaa(`a`) AS `b`
#> FROM `test`

Created on 2020-05-13 by the reprex package (v0.3.0)

1

1 Answers

2
votes

There may be simpler approach, but one option is to use a function that operates on and returns the entire data frame, rather than running the function inside mutate:

func_aaa2 = function(data, var, newvar, val, newval) {
  data %>% 
    mutate({{newvar}} := case_when(
      {{var}} == val ~ newval,
      TRUE ~ {{var}}))
}

test_db %>% 
  func_aaa2(a, "b", "a", "aaa")
  a     b    
  <chr> <chr>
1 a     aaa  
2 b     b    
3 c     c
test_db %>% 
  func_aaa2(a, "b", "a", "aaa") %>% show_query()
SELECT `a`, CASE
WHEN (`a` = 'a') THEN ('aaa')
WHEN (1) THEN (`a`)
END AS `b`
FROM `test`