I'm trying to select lines from one table ("positons") with values for a particular column ("position") that fall within the ranges defined in another ("my_ranges") table, and then to add a grouping tag from the "my_ranges" table.
I can do this using tibbles and a couple purrr::map2
calls, but the same approach doesn't work with dbplyr database-tibbles. Is this expected behavior, and if so, is there a different approach that I should take to use dbplyr for this kind of task?
Here's my example:
library("tidyverse")
set.seed(42)
my_ranges <-
tibble(
group_id = c("a", "b", "c", "d"),
start = c(1, 7, 2, 25),
end = c(5, 23, 7, 29)
)
positions <-
tibble(
position = as.integer(runif(n = 100, min = 0, max = 30)),
annotation = stringi::stri_rand_strings(n = 100, length = 10)
)
# note: this works as I expect and returns a tibble with 106 obs of 3 variables:
result <- map2(.x = my_ranges$start, .y = my_ranges$end,
.f = function(x, y) {between(positions$position, x, y)}) %>%
map2(.y = my_ranges$group_id,
.f = function(x, y){
positions %>%
filter(x) %>%
mutate(group_id = y)}
) %>% bind_rows()
# next, make an in-memory db for testing:
con <- DBI::dbConnect(RSQLite::SQLite(), path = ":memory:")
# copy data to db
copy_to(con, my_ranges, "my_ranges", temporary = FALSE)
copy_to(con, positions, "positions", temporary = FALSE)
# get db-backed tibbles:
my_ranges_db <- tbl(con, "my_ranges")
positions_db <- tbl(con, "positions")
# note: this does not work as I expect, and instead returns a tibble with 0 obsevations of 0 variables:
# database range-based query:
db_result <- map2(.x = my_ranges_db$start, .y = my_ranges_db$end,
.f = function(x, y) {
between(positions_db$position, x, y)
}) %>%
map2(.y = my_ranges_db$group_id,
.f = function(x, y){
positions_db %>%
filter(x) %>%
mutate(group_id = y)}
) %>% bind_rows()