The first approach that comes to mind is converting dates to text, because there are already dbplyr translations for different forms of text manipulation. This approach depends on as.character
to convert dates to character, and substr
to extract the year, month, or day as text. This can then be converted to numeric and manipulated further.
(1) Set up simulated database connection to test translation (pick your preferred SQL flavor):
library(dplyr)
library(dbplyr)
df = data.frame(start_dates = c('2020-01-31', '2020-02-28', '2020-03-31'))
# simulate a connection to test translation (pick your preferred flavor)
df = tbl_lazy(df, con = simulate_mssql())
# df = tbl_lazy(df, con = simulate_hive())
# df = tbl_lazy(df, con = simulate_impala())
# df = tbl_lazy(df, con = simulate_oracle())
# df = tbl_lazy(df, con = simulate_postgres())
# df = tbl_lazy(df, con = simulate_mysql())
# df = tbl_lazy(df, con = simulate_sqlite())
(2) Example - extract date components, increment year, and recombine:
output = df %>%
mutate(text_date = as.character(start_dates)) %>%
mutate(text_year = substr(text_date, 1, 4),
text_month = substr(text_date, 6, 7),
text_day = substr(text_date, 9, 10)) %>%
mutate(num_year = as.numeric(text_year),
num_month = as.numeric(text_month),
num_day = as.numeric(text_day)) %>%
select(start_dates, num_year, num_month, num_day) %>%
mutate(next_year = num_year + 1) %>%
mutate(next_year_text_date = paste0(next_year, '-', num_month, '-', num_day)) %>%
mutate(next_year_date = as.Date(next_year_text_date)) %>%
select(start_dates, next_year_date)
Calling show_query(output)
then gives the following translation, but not as nicely formatted. I know the nested queries are not considered good SQL practice, but this is how dbplyr translation works.
SELECT `start_dates`
, TRY_CAST(`next_year_text_date` AS DATE) AS `next_year_date`
FROM (
SELECT `start_dates`
, `num_year`
, `num_month`
, `num_day`
, `next_year`
, `next_year` + '-' + `num_month` + '-' + `num_day` AS `next_year_text_date`
FROM (
SELECT `start_dates`
, `num_year`
, `num_month`
, `num_day`
, `num_year` + 1.0 AS `next_year`
FROM (
SELECT `start_dates`
, TRY_CAST(`text_year` AS FLOAT) AS `num_year`
, TRY_CAST(`text_month` AS FLOAT) AS `num_month`
, TRY_CAST(`text_day` AS FLOAT) AS `num_day`
FROM (
SELECT `start_dates`
, `text_date`
, SUBSTRING(`text_date`, 1, 4) AS `text_year`
, SUBSTRING(`text_date`, 6, 2) AS `text_month`
, SUBSTRING(`text_date`, 9, 2) AS `text_day`
FROM (
SELECT `start_dates`
, TRY_CAST(`start_dates` AS VARCHAR(MAX)) AS `text_date`
FROM `df`
) `q01`
) `q02`
) `q03`
) `q04`
) `q05`
(3) Extract components, compact:
output = df %>%
mutate(num_year = as.numeric(substr(as.character(start_dates), 1, 4)),
num_month = as.numeric(substr(as.character(start_dates), 6, 7)),
num_day = as.numeric(substr(as.character(start_dates), 9, 10)))
With SQL translation from show_query(output)
much shorter:
SELECT `start_dates`
, TRY_CAST(SUBSTRING(TRY_CAST(`start_dates` AS VARCHAR(MAX)), 1, 4) AS FLOAT) AS `num_year`
, TRY_CAST(SUBSTRING(TRY_CAST(`start_dates` AS VARCHAR(MAX)), 6, 2) AS FLOAT) AS `num_month`
, TRY_CAST(SUBSTRING(TRY_CAST(`start_dates` AS VARCHAR(MAX)), 9, 2) AS FLOAT) AS `num_day`
FROM `df`
Hopefully this works on all the flavors of SQL that dbplyr can translate to. As I do not have access to every SQL flavor to test it, comments from people who have tested it on a specific SQL flavor would be helpful.
RPostgres
, you can uselubridate
functions quite readily, as illustrated here. I think one can do all the things you ask for in this way. If this doesn't work with other SQL implementations (e.g., MySQL), this may be require work to the associated packages (e.g., inRMySQL
). – Ian Gow