Join smaller datasets with large dataset

Question

We have a large dataset in SAS environment having 30 million records and small datasets having 100 000 records each.

We have to left join these smaller datasets with "Large" Table, and it is taking around 30-40 minutes for joining the smaller dataset with large dataset.

If we run the job for 5-6 datasets individually, it took a lot of time. If we merge all these datasets in to a single dataset and then do the left join, will it take less time compared to individual? Also, there is space crunch in WORK so we have to consider that also.

How do you read those datasets? Do you have SQL database there? Or you do REST API calls? — evgeny.myasishchev
@evgeny.myasishchev we use SAS enterprise, so all datasets are there on SAS. no API call. — rns
If you have sufficient amount of RAM (size of small tables is less than RAM) , the perfect solution is a SAS hash table. — Robert Soszyński
Put an index on the smaller data sets, use a data step, and then lookup using key. — Gordon Linoff
@GordonLinoff should we merge all the tables and then create a Index or create index on individual table and then join them. And will it reduce the processing time ? — rns

user601828 user601828 · Accepted Answer · 2017-02-05T21:18:29

Try a hash join

DATA want;
IF 0 THEN SET SMALL_DATA_SET;
if _N_ = 1 then do;
declare hash HASH_NAME(dataset:"SMALL_DATA_SET", multidata:'y');
HASH_NAME.defineKEY("YOUR_KEY_VARIABLE");
HASH_NAME.defineData (ALL:'YES');
HASH_NAME.defineDone();
END;
set BIG_DATA_SET;
IF
HASH_NAME.FIND(KEY:YOUR_KEY_VARIABLE) = 0 THEN OUTPUT;
RUN;

Join smaller datasets with large dataset

2 Answers