0
votes

I used following code to automatic split datasets within one library.

  1. Firstly, I use proc sql to put all dataset into a table and number them.
  2. Secondly, I also use proc sql to read the content in each dataset in order to set rules for split. More specifically, the split process is based on two variable: date_l_ and _ric. Obviously, date_l_ is a date variable. And _ric is a variable that identify the name of stock. The results of this step is shown as following:

enter image description here

  1. Finally, I use %do j=1 %to &obs. to split the dataset.

The following are my codes.

%macro split(sourcelib=,source=,result=);
proc sql noprint;  /*read datasets in a library*/
create table mytables as
select *
from dictionary.tables
where libname = &sourcelib 
order by memname ;

select count(memname) 
into:numb 
from mytables;

%let numb=&numb.;  /*give a number to datasets in the library*/

select memname
into :memname1-:memname&numb.
from mytables;
quit;

%do i=1 %to &numb.;
proc sql noprint;
create table tmp&i as
select distinct date_l_, _ric
from &source.&&memname&i;

select count(*)
into :obs
from work.tmp&i.;

%let obs=&obs.; /*read the variable 'date_l_' and '_ric' in each dataset*/

select date_l_, _ric, catx("_", "&result.", substr(_ric, 1, 13), date_l_)
into :date_l_1-:date_l_&obs., :ric1-:ric&obs., :setname1-:setname&obs.
from work.tmp&i;
quit;
%end;

data 
%do j = 1 %to &obs.;   /*set rules for separated dataset*/
&&setname&j
%end;
;
set
%do i=1 %to &numb.;
&source.&&memname&i
%end;
; 
select;
%do j = 1 %to &obs.;
when(_ric = "&&ric&i" and date_l_ = &&date_l_&i) output &&setname&j;
%end;
end;
%mend;

%split(sourcelib='DATA',source=DATA.,result=AXP.);

However, I faced to the error as shown following: enter image description here According to Problem note: If SAS has to acquire memory in order to process a direct access bound library, and memory has already been exhausted, error messages might occur in the SASLOG.

Since I have around 100 dataset and contrain various data and RIC (variable name), it is impossible to split the dataset manually. In this case, how could I improve my code to improve this code?

2
It's more likely you're out of disk space.david25272
disk space? I thought it was the problem of memory. Is that possible solve the problem by improving/changing code ?Neal801
Didn't you also post this at communities.sas.com? I was pretty sure I posted a better option there.Reeza
is that the way based on call execute? I've tried. It's also out of memoryNeal801
It shouldn't since each step is running independently. Do you have physical space to store all the data?Reeza

2 Answers

1
votes

This is a disk space problem. The "I/O" dataset error tends to occur not because of memory issues, but because of disk space. Your dataset has a lot of repeating values, which is good for you in terms of compression. Three things you want to do:

1. Set the column lengths to only what is needed

This can be done for each variable with attrib or length statements. A way I've found useful is using the %squeeze() macro on the initial dataset, saving those attributes to a separate empty dataset, then using create table like to retain those attributes without having to run %squeeze() again. For example:

/* Shrink the dataset */
%squeeze(everything, everything_squeezed);

/* Save a copy of all these attributes in an empty dataset */
proc sql noprint;
    create table everything_attribs like everything_squeezed;
quit;

This can be useful if your full dataset is updated or overwritten regularly, losing your attributes. Use everything_attribs as the first table in your set statements.

2. Use the compress dataset option

Compressing your dataset can save incredible amounts of space. SAS uses two algorithms: one is good for character variables, one is good for numeric. In your case, try both and see which gives the best results.

%do i=1 %to &numb.;
proc sql noprint;
create table tmp&i(compress=yes) as
select distinct date_l_, _ric
from &source.&&memname&i;

select count(*)
into :obs
from work.tmp&i.;

%let obs=&obs.; /*read the variable 'date_l_' and '_ric' in each dataset*/

select date_l_, _ric, catx("_", "&result.", substr(_ric, 1, 13), date_l_)
into :date_l_1-:date_l_&obs., :ric1-:ric&obs., :setname1-:setname&obs.
from work.tmp&i;
quit;
%end;

data 
%do j = 1 %to &obs.;   /*set rules for separated dataset*/
&&setname&j(compress=yes)
%end;
;
set
%do i=1 %to &numb.;
&source.&&memname&i
%end;
; 
select;
%do j = 1 %to &obs.;
when(_ric = "&&ric&i" and date_l_ = &&date_l_&i) output &&setname&j;
%end;
end;
%mend;

3. Check how much memory is being used

Turn on the system option fullstimer to get a detailed view of the amount of memory being used for each step. If too much memory is being used, you will need to do different approaches that use less total memory.

For example, you are outputting everything in a single data step. You could convert that into multiple individual data steps rather than one large one.

0
votes

The key line in your photograph of the SAS log is the one that says:

NOTE: Table WORK.TMP4 created, with 17329 rows and 2 columns.

You cannot create seventeen thousand datasets in one data step. Why would you even want to do that? You will either need to do this in multiple data steps or better still find a different solution to your original problem.