Before we start
- As you call your flag
hiv_screen
, I assume you actually only need the observations you will flag. If so, you better select them instead of flagging them.
- If you will only use the results once, it is better to create a view than a table. This way you avoid a lot of disk access.
- All logic in a data step is applied after building the variable vector, except that in the where statement and I assume the same is true for the where clause in a SQL query. Therefore, it is faster to select data in a where statement/clause than to flag them.
- Data steps are not intrinsically slower than SQL queries. If they are, you don't use them well. For instance, with a keep statement, you select which variable you want and don't write others to disk anymore.
Macro's as an alternative for arrays
A macro is very comparable to pre-compiler code in, for instance C++. It adds statements to your program.
%macro any_ICD_in (string_list, ICD_count);
ICD1 in (&string_list)
%do icd_nr = 2 %to &ICD_count;
or ICD%icd_nr in (&string_list)
%end;
%mend;
%* test the macro;
%put %any_ICD_in('79571', 'R75');
Using the macro
in a data step
data inpatient_claims_2018;
set cl2018.ip_claims;
hiv_screen = %any_ICD_in('79571', 'R75');
covid19_screen = %any_ICD_in('I', 'dont', 'know');
...
run;
Improving performance by only reading relevant data
in a data step
data inpatient_claims_2018;
set cl2018.ip_claims;
where %any_ICD_in('79571', 'R75')
or %any_ICD_in('I', 'dont', 'know')
or ...
;
hiv_screen = %any_ICD_in('79571', 'R75');
covid19_screen = %any_ICD_in('I', 'dont', 'know');
...
keep hiv_screen covid19_screen ... <and all other variable you really need>;
run;
Alternatively avoid writing the data to disk at all
in a data step
data inpatient_claims_2018 / view=inpatient_claims_2018;
set cl2018.ip_claims;
hiv_screen = %any_ICD_in('79571', 'R75');
covid19_screen = %any_ICD_in('I', 'dont', 'know');
...
run;
I will not take the time to add SQL solutions today, but I might tomorrow.
LIBNAME CL2018
statement you are using ? What data base are you connecting to? Is it local network or a local/remote network data base ? What engine are you using ? – Richard