How to efficiently read large sas data sets

Question

I have a sas data set that is about 8 gigabytes. I was wondering if there is an easy way I can read this data set in in the data step. It takes about 2 hours for the data step to complete.

Define 'efficient' for your use. Do you want to optimize on speed, minimal memory usage, minimal cpu usage, minimal disk/network IO, etc.? — atk
I want the data step to complete in under 10 minutes. I would ideally like to compress the dataset. I am testing a program so I will have to reread the data multiple times. Also when scrolling through large data sets, there is lag. — lord12
Can you link to a definition of 'sas data'? Also, do you have the option of pre-processing data before it reaches your software (splitting, parsing, etc)? Do you read the data from network or from disk? And have you profiled your software to determine where the bottleneck is? — atk
can you give us an example of what your code is doing? It's hard to make suggestions without an idea of what you are doing. — DomPazz

Joe Joe · Accepted Answer · 2014-01-27T16:45:12

Specific times and performance will depend on your hardware. However, some tips.

options compress=yes; will compress the dataset, saving potentially large amounts of space (depending on the data). options compress=char; is another option appropriate when character data is the majority of the space used.
Limit the number of times you read through the data. Write your program such that it doesn't need as many data passes. Consider using views, and when combining datasets using techniques like format or hash instead of sorting and joining.
Use PROC PRINT to view the data rather than browsing the dataset, as you can customize the results more effectively.
If you are on a server consider the SPDE engine. That allows you to spread the data across multiple disks.

How to efficiently read large sas data sets

2 Answers