I'm pulling data into SAS from a SQL server database with a (SAS DI) extract transformation. The data in the table goes back several years and has a little over 16 million rows; I only need data for the last few years which should amount to roughly 2.6 million rows.
So my extract uses a datetime filter. The autogenerated proc sql looks as follows (after obfuscating the libref, table, and column names):
%let SYSLAST = %nrquote(LIBREF.SRC_TBL);
proc sql;
create table work.WFJVX4PU as
select
col1,
col2,
dt_col,
col3
from &SYSLAST
where dt_col >= &start_dt
;
quit;
The code works and returns the desired rows, but it takes far too long to run when compared with executing a similar query directly on the SQL server. When inspecting the statistics of the execution I discovered that the entire table was brought into SAS before the where clause was applied.
I remember reading that this will occur when attempting to use filters that apply functions to source data (a la datepart(dt_col) >= input("&start_date",date9.)
) which is why I tried to pass the datetime used by the filter directly to the SQL engine. I also tried the "datepart
" approach and got the same result.
Is there something else I should be doing to apply the filter server side before bringing the data into SAS? This isn't the experience I've had in the past when working with other database tables (e.g. Teradata, MySQL, Oracle, etc).
Additional Details:
The macro variable start_date
is defined using the SAS DI prompt creation tool; the auto-generated code for this is
%let start_date_label = 730 days ago (July 02, 2017);
%let start_date_rel = D-730D;
%let start_date = 02Jul2017;
I then create the datetime macro variable start_dt
by executing the following in the precode of the job:
%let start_dt = %sysfunc(dhms("&start_date"d,0,0,0));
Below is the libname statement (after obfuscation):
LIBNAME MYLIB SQLSVR CONNECTION=SHARED PRESERVE_TAB_NAMES=YES dbconinit="set ansi_warnings off;use MY_DB_NAME;set nocount on;" Datasrc=MY_DATA_SRC SCHEMA=dbo AUTHDOMAIN="my_sql_authentication" ;
If I modify the where clause to use a literal of the form '2019-06-25'
then SAS throws the error
ERROR: Expression using greater than or equal (>=) has components that are of different data types.
because the dt_col
field is of type numeric (format datetime22.3
) on the SAS instance of the SQL server table that is registered from the library. If I use a literal as follows:
dt_col >= '25JUN2019'D
then I do get the desired set of rows (despite comparison between a datetime field and a date literal), but the query still takes a long time to execute and the job statistics indicate SAS is still grabbing all 16 million rows to perform this task.
UPDATE
I'm having issues following Tom's advice below. If I execute the following code:
LIBNAME MYLIB SQLSVR CONNECTION=SHARED PRESERVE_TAB_NAMES=YES dbconinit="set ansi_warnings off;use MYDB;set nocount on;" Datasrc=MYSRC SCHEMA=dbo AUTHDOMAIN="my_sql_authentication" ;
/*---- Map the columns ----*/
proc datasets lib = work nolist nowarn memtype = (data view);
delete sql_psthru2;
quit;
proc sql;
create table work.sql_psthru2 as
select
col1,
col2,
dt_col,
col3
from MYLIB.MYTBL
where dt_col>= '25JUN2019'd
;
quit;
then I get data from the database, but if I execute
LIBNAME MYLIB SQLSVR CONNECTION=SHARED PRESERVE_TAB_NAMES=YES dbconinit="set ansi_warnings off;use MYDB;set nocount on;" Datasrc=MYSRC SCHEMA=dbo AUTHDOMAIN="my_sql_authentication" ;
proc sql;
connect using MYLIB;
create table work.sql_psthru as
select * from connection to MYLIB
(select
col1
,col2
,dt_col
from MYTBL
where dt_col >= '2019-06-25'
)
;
quit;
then I receive the error
ERROR: CLI error trying to establish connection: [DataDirect][ODBC lib] Data source name not found and no default driver specified
immediately after the connect using MYLIB;
line.
I've tried many variants of the explicit passthru which I've found all over the internet which I won't post here, but none worked.
An interesting side note is that I believe the reason the SAS DI statistics are indicating that all 16 million rows are returned is that the extract transformation auto-generates the following macro:
%macro etls_recordCheck;
%let etls_recCheckExist = %eval(%sysfunc(exist(MYLIB.MYTBL, DATA)) or
%sysfunc(exist(MYLIB.MYTBL, VIEW)));
%if (&etls_recCheckExist) %then
%do;
proc sql noprint;
select count(*) into :etls_recnt from MYLIB.MYTBL;
quit;
%end;
%mend etls_recordCheck;
%etls_recordCheck;
So the culprit in the long execution time is not that the full dataset is being returned to SAS (I removed the macro and the code still takes far too long to run).
libname
(redacting privileged info). Please also show the definition to&start_dt
. What happens if you pass a literal, '2019-01-01' (non-date)? – Parfait"&start_date:00:00:00"dt
to specify the datetime constant with midnight for time part, but I doubt it will help with your issue. – Tom