12
votes

I'm working on a fork of the RSQLServer package and am trying to implement joins. With the current version of the package, joins for any DBI-connected database are implemented using sql_join.DBIConnection. However, that implementation doesn't work well for SQL server. For instance, it makes use of USING which is not supported by SQL server.

I've got a version of this function sql_join.SQLServerConnection working (though not complete yet). I've based my function on sql_join.DBIConnection as much as possible. One issue I've had is that sql_join.DBIConnection calls a number of non-exported functions within dplyr such as common_by. For now, I've worked around this by using dplyr:::common_by, but I'm aware that that's not ideal practice.

Should I:

  1. Ask Hadley Wickham/Romain Francois to export the relevant functions to make life easier for people developing packages that build on dplyr?
  2. Copy the internal functions into the package I'm working on?
  3. Continue to use the ::: operator to call the functions?
  4. Something else?

Clearly with option 3, there's a chance that the interface will change (since they're not exported functions) and that the package would break in the longer term.

Sample code:

sql_join.SQLServerConnection <- function (con, x, y, type = "inner", by = NULL, ...) {
  join <- switch(type, left = sql("LEFT"), inner = sql("INNER"), 
                 right = sql("RIGHT"), full = sql("FULL"), stop("Unknown join type:", 
                                                                type, call. = FALSE))
  by <- dplyr:::common_by(by, x, y)
  using <- FALSE # all(by$x == by$y)
  x_names <- dplyr:::auto_names(x$select)
  y_names <- dplyr:::auto_names(y$select)
# more code
}
1
@hadley this could be a nice side-note / case study in your R Packages best practices section.JasonAizkalns
@NickK, Just a side comment: I had a similar problem with Google BigQuery and implemented sql_join with :::. Please take a look at this pull request. My guess is there is a very high probability that it should work with SQL Server right out of the box.akhmed
@NickK, also on topic: see this discussion in the (already merged) pull request. I was explicitly encouraged to re-use the dplyr function with ::: instead of re-implementing it.akhmed
@akhmed thanks. Looks like you faced similar issues to the SQL Server ones and had similar solutions! I'll try it and see.Nick Kennedy
That's great if it ends up being useful outside BigQuery too! (I guess the only change is: "JOIN EACH" will need to be replaced back to "JOIN" for SQL Server)akhmed

1 Answers

1
votes

It looks to me like you may not have to use those functions verbs. Since dplyr now put it's database functionality in dbplyr, the relevant code is here. I don't see the use of auto_names or common_by there.

I strongly recommend following the steps in Creating New Backends after reading SQL Translation.

It may also be worth reviewing some other alternative backends, such as Hrbrmaster's sergeant package for Apache Drill using JDBC.