I have this proc sql query in my current code. Unfortunately, I'm dealing with over 10 million records, so it takes hours to run. I've been trying to convert it to a data step, thinking it would run much quicker. However, I can't seem to get the same data results. If anyone can help me with the data step I'd greatly appreciate it. Or if you have suggestions on how I can make the proc sql run more efficiently.
Here is my proc sql query:
proc sql;
create table test as
select *
from table1 a
where exists (select 1
from table2 b
where b.acct_id = a.acct_id);
quit;
This is the data step I tried converting it to:
proc sort data=table1; by acct_id; run;
proc sort data=table2; by acct_id; run;
data test;
merge table1 (in=a)
table2 (in=b);
by acct_id;
if a and b;
run;
select * from table1 where acct_id in (select acct_id from table2)
perform any faster? (That query might make the intent more clear to the query planner - hopefully saving the need for indexing/sorting/joining) – Jon Clements