0
votes

I'm a bit confused about the use of indexes in SAS.

I found 2 ways of implementing them :

with PROC SQL

CREATE <UNIQUE> INDEX index-name 
 ON table-name ( column <, ... column>);  

with PROC DATASETS

proc datasets library=college;
   modify survey;
      index create class;
      index create major;
run;

proc datasets library=college;
   modify maillist;
      index create zipid=(zipcode schoolid);
run;

proc datasets library=college;
   modify student;
      index create idnum / unique;
run;

proc datasets library=college;
   modify student;
      index create religion / nomiss;
run;

My first question is, do they do the same thing ? I would assume they do but given the doc doesn't seem to cross reference them and that there is no option for nomiss in PROC SQL, I have some doubts.


My second question is, given I have Oracle libraries and SAS libraries, what is the relations between indices defined in ORACLE to those defined in SAS ?

The following query runs very fast, because the table was indexed in the Oracle DB:

PROC SQL;
  SELECT * FROM ORACLELIB.MY_TABLE
  WHERE RELATION_ID = 1097;
QUIT;

But if I try to do a simple inner join on a 1 column, 1 observation, indexed table located in my WORK library, the query takes an awful lot of time:

data TEST;
   input RELATION_ID;
   datalines;
   1097
;

proc datasets library=WORK;
   modify TEST;
      index create RELATION_ID;
run;

PROC SQL;
  SELECT * FROM ORACLELIB.MY_TABLE t1
  INNER JOIN TEST t2
  ON t1.RELATION_ID = t2.RELATION_ID;
QUIT;

This is also slow :

PROC SQL;
  SELECT * FROM ORACLELIB.MY_TABLE t1
  WHERE RELATION_ID IN (SELECT RELATION_ID FROM TEST);
QUIT;

I also tried to create my test table in an oracle library, which is not located on the same oracle server) :

data ORACLELIB2.TEST;
   input RELATION_ID;
   datalines;
   1097
;

This seems to be just as slow (In this case I cannot index it due to restrictions but it's only one observation).


Which leads to my last question, what can I do to make this join run as fast as it should ?


Additional notes:

I have 3 servers:

  • the main ORACLE server, where I cannot write, which contains the library featuring MY_TABLE
  • Another ORACLE server, where I can write.
  • A SAS server, where I can write

My main use will be queries of the following type:

CREATE TABLE NEWTABLE AS
SELECT *columns* FROM smaller_table t1 LEFT JOIN MY_TABLE t2
ON t1.RELATION_ID = t2.RELATION_ID and t1.other_col = t2.other col
WHERE *additional restrictions on t2*;
3

3 Answers

2
votes

You will want to investigate the SASTRACE= option (9.2 docs) that will log what the remote library engine is doing. I would have expected the engine to do it's own implicit pass-through on simple joins. The engine will translate some SAS functions and grouping clauses automatically to Oracle side equivalents.

If you are unable to get server side to operate in expected fast ways, for small sets of IDs {a couple thousand}) consider code generating an IN list and see if that performs better.

proc sql noprint;
  select relation_id into :id_list separated by ',' from test;

  SELECT * FROM ORACLELIB.MY_TABLE t1
  WHERE RELATION_ID IN (&id_list);
quit;

If the ids are character, the code generator will have to quote() the values in the select.

1
votes

Regarding

If I move my test table to an oracle library these queries are still slow (I cannot index it in this case due to restrictions but it's only one observation).

How did you move to Oracle ? p.s. ORACLELIB is not a valid SAS libref, so I'm guessing that is pseudo-code.

Take a look at SAS/ACCESS® 9.4 for Relational Databases: Reference, Ninth Edition, Example: Create and Join a Permanent Table and a Temporary Table.

The example features dbmstemp=yes libname option and

shows how to use this option to create a permanent and temporary table and then join them in a query.

You could also upload the test table data into a connection session TEMP table, and then selecting the result set via pass-through SQL query. A pass-through query will let you have full control over the query include host specific features (for example Oracle hints that force index usage) that the SAS remote library engine may not utilize.

Regardless of the join way, the inter-system transfer is likely one of the most time consuming aspects. You want to find the way that performs the most work on the least transfers.

1
votes

Actually, there are 3 ways to create an index in SAS:

  1. Proc Datasets/Modify/Index Create
  2. a dataset option (index=), and
  3. Proc SQL/ Create Index

AFAIK, the first two are the only way to specify options in addition to Unique.

The answer to how to make the join faster will likely involve a combination of (pass-thru) processing in your Oracle DB and pre/post processing in SAS. And may or may not involve creating indexes.

This will depend on:

  • a) where your data is and data size(s), and
  • b) what you're trying to achieve in term of result.

Can you provide more info?

EDIT: ok, large table in Oracle, small table in SAS, small table being the driver of output rows

Without knowing the absolute details, the reason your query was slow had little to do with indexes but rather to do with the fact that you "pulled" through the data from the large Oracle table to satisfy the query.

Easy to fix, as per the last sentence of Richard's updated answer: transfer the small table to Oracle...

To be clear, transfer the small table to Oracle, run the query there and save the results there. Use an appropriate transfer facility to export the data to SAS if required...