0
votes

I have a SAS macro part that will run the entire dataset and do some analysis, and during the analysis part, each instance is dealt with once a time.so that the entire dataset can be run and we could keep an eye on the 'log' file.

However, I would like to split the entire dataset into several parts. (Ex, first 500 observations, 501~1000 observations, etc). Which means the engine will stop after running 500 instances, and then run again. And in the end they can be generated in one table as before. How can I add this 'split' part into my prior code?

Initial Code:

 %macro mymac;
 OPTIONS NOTES SOURCE SOURCE2 MPRINT MLOGIC MERROR SYMBOLGEN;

 /* Part A starts*/

 data _null_;
 set WORK.LOCATION end=last;
    if last then call symput('nfiles',_n_);
 run;

 %do i=1 %to &nfiles;

 data _null_;
 set oriework.PO_LOC;
    if &i=_n_ then call symput('code',LOCATION_ID);
 run;

 /* Part A ends */

 %put &code;

 proc sql; 

 create table WORK.pt as select
 ......

 quit;

 %if %sysfunc(exist(WORK.result)) %then %do;
 data WORK.result;
 set WORK.result WORK.pt;
 run;
 %end;
 %else %do;
 data WORK.result;
 set WORK.pt;
 run;
 %end;


 %end;

 %mend;

 %mymac;

Where 'WORK.LOCATION' is the dataset that I call in 'proc sql' procedure that contains all the 'LOCATION_ID" information that I need.

Part A is where Macro works that it runs from start to the end; can I replace it with a data splitting procedure so that every 500 observations are run altogether and finally combined to one table?

Thank you!

1
Without seeing what your proc sql step is doing, it's hard to give any pointers, nor is it clear what you're trying to achieve. Nonetheless, by processing is likely a better candidate for what I think you're trying to do. Also, appending via a datastep is far less efficient than using proc append.Chris J
Thanks Chris. I check "by" function, like %do i=1 %to 100 %by 5; Will it run the observation 1, 6, 11, ... , 96; while the rest observations are left there without running? I didn't add my proc sql part into it because it is about 200 lines, and contains a lot of calculation.Crubal Chenxi Li

1 Answers

1
votes

Use a second macro do loop on dataset (around your proc sql statement), calling macro variables for a counter, the next start observation and the number of observations to process in the next batch. Then you can use these in dataset options using firstobs= and obs= e.g. (firstobs= &startobservation obs=&nobs) and the counter macrovar in the dataset name. This also works in proc sql. Just add the options to your from or join statement. Afterwards append the datasets using datastep, proc append or proc sql; insert into