0
votes

I am looking for solutions or ideas how to speed the processing of large data sets in sas.

What would you recommend?

What is better data step or proc sql procedure?

3
What kind of processing are you doing specifically? Lots of queries, calculations etc.? Are you dealing with a relational database?Jarom
You're off to a good start, yet it would be helpful if you could provide more information.Mozahler
I'll up vote your question, as it is still generic enough to help other sas beginners with the same question.momo1644

3 Answers

1
votes

Speeding up your data processing depends on where your data is saved.

Your data can be either in:

  • SAS Table,
  • Database Table (Miscrosfot SQL, Oracle, DB2, MYSQL, .. etc.)

Use SAS Data Step when:

  • You are querying/processing SAS tables,
  • You want to do iterative processing (ex. retaining values or using arrays).

Use Proc SQL when:

  • You are querying a large Database table,
  • You can do a SQL "Pass Through" where you send SQL code to be executed on the DB server and only the output is sent to SAS (instead of bringing the entire tables through the network to SAS and then filter it),

  • You want to query SAS Tables but prefer SQL joins to data step merges.

Another topic you should consider is efficiency programming; where you are optimising your query and look-ups.

0
votes

I find Proc SQL to be better for my use cases. We may need some more specifics on the size and variety of data your trying to join/export etc.

Give us some info on that and we can try to help.

Tips: Limit the fields your pulling over Subset data

0
votes

Anecdotally from my experience Proc SQL seems faster.

Here are two tips on speeding up queries with Proc SQL:

In general, you want to rule out as much data as possible when querying. If you are usingProc SQL, the order of the restrictions in the where clause matters. Put the most restrictive parts first.

For example, if I'm querying a database for teachers with the last name "JONES", that were hired after Jan 2005, I would structure my where clause like this: where last_name = 'JONES' and hire_date > 200501 I would do this because last name is likely to exclude more records than the hire date restriction.

When possible, don't use Select * instead, list out the specific columns that you need. Remember, even if you are doing a calculation with a column, you don't have to include that column in your select statement.

Here is a very useful resource for understanding how to use proc sql efficiently. I recommend reading it in it's entirety if you do a lot of work with large data sets in SAS.

http://www2.sas.com/proceedings/sugi29/127-29.pdf