I am attempting to narrow-down the resulting SAS dataset of a pass-through query (i.e., using PROC SQL, not LIBNAME) to Hadoop by sending additional conditions in the inner (Hadoop-side) WHERE clause.
For example, the code that works looks like this (the dtpart macro variable is defined earlier in order to down-select to a single partition in the HDFS) :
proc sql;
connect to hadoop (server="&srvid" port=10000 user="&uid" pw="&passwd" schema=default );
create table work.creative_lkup as
select
advertiser_id,
creative_id,
creative,
rendering_id,
creative_type,
input("&dtpart.", yymmdd10.) as last_updt_dt format=mmddyy10.
from connection to hadoop (
select
`advertiser id` as advertiser_id,
`creative id` as creative_id,
creative,
`rendering id` as rendering_id,
`creative type` as creative_type
from default.match_table_creatives
where date_partition = "&dtpart."
)
WHERE advertiser_id = '12345';
disconnect from hadoop;
quit;
NOTE: As you can see, the field names in Hadoop are not SAS-standard (hence the back-ticks around the names containing spaces) and for some tables, the field and table names get really lengthy. This makes it impractical to simply use a LIBNAME statement for my Hadoop connection, and therefore I must use a pass-through query in PROC SQL using the CONNECT TO.
What I'd like to do is move the "WHERE advertiser_id = '12345'" portion inside the pass-through to Hadoop. Something like this:
proc sql;
connect to hadoop (server="&srvid" port=10000 user="&uid" pw="&passwd" schema=default );
create table work.creative_lkup as
select
advertiser_id,
creative_id,
creative,
rendering_id,
creative_type,
input("&dtpart.", yymmdd10.) as last_updt_dt format=mmddyy10.
from connection to hadoop (
select
`advertiser id` as advertiser_id,
`creative id` as creative_id,
creative,
`rendering id` as rendering_id,
`creative type` as creative_type
from default.match_table_creatives
where date_partition = "&dtpart."
and `advertiser id` = '12345'
);
disconnect from hadoop;
quit;
Unfortunately, when I do this, I get a nice vague error from Hadoop passed back to SAS:
ERROR: Prepare error: Error while processing statement: FAILED: Execution Error, return code 1 from
org.apache.hadoop.hive.ql.exec.mr.MapRedTask
My question is: are there any settings, options, or other items that I could include in my PROC SQL statement in order to allow proper WHERE clause utilization?
It appears that the problem only occurs when there is an AND in the WHERE clause. I've read a few things about SPDE (the ACCELWHERE= option sounds promising), but I have no idea how to go about using such options in PROC SQL and the reference materials I've found online seem to only point to use in a LIBNAME statement.
Thank you.