I am looking for solutions or ideas how to speed the processing of large data sets in sas.
What would you recommend?
What is better data step or proc sql procedure?
I am looking for solutions or ideas how to speed the processing of large data sets in sas.
What would you recommend?
What is better data step or proc sql procedure?
Speeding up your data processing depends on where your data is saved.
Your data can be either in:
Use SAS Data Step when:
Use Proc SQL when:
You can do a SQL "Pass Through" where you send SQL code to be executed on the DB server and only the output is sent to SAS (instead of bringing the entire tables through the network to SAS and then filter it),
You want to query SAS Tables but prefer SQL joins to data step merges.
Another topic you should consider is efficiency programming; where you are optimising your query and look-ups.
Anecdotally from my experience Proc SQL
seems faster.
Here are two tips on speeding up queries with Proc SQL
:
In general, you want to rule out as much data as possible when querying. If you are usingProc SQL
, the order of the restrictions in the where clause matters. Put the most restrictive parts first.
For example, if I'm querying a database for teachers with the last name "JONES", that were hired after Jan 2005, I would structure my where clause like this: where last_name = 'JONES' and hire_date > 200501
I would do this because last name is likely to exclude more records than the hire date restriction.
When possible, don't use Select *
instead, list out the specific columns that you need. Remember, even if you are doing a calculation with a column, you don't have to include that column in your select
statement.
Here is a very useful resource for understanding how to use proc sql
efficiently. I recommend reading it in it's entirety if you do a lot of work with large data sets in SAS.