1
votes

Is it possible to repeat a data step a number of times (like you might in a %do-%while loop) where the number of repetitions depends on the result of the data step?

I have a data set with numeric variables A. I calculate a new variable result = min(1, A). I would like the average value of result to equal a target and I can get there by scaling variable A by a constant k. That is solve for k where target = average(min(1,A*k)) - where k and target are constants and A is a list.

Here is what I have so far:

filename f0 'C:\Data\numbers.csv';
filename f1 'C:\Data\target.csv';

data myDataSet;
    infile f0 dsd dlm=',' missover firstobs=2;
    input A;
    init_A = A; /* store the initial value of A */
run;

/* read in the target value (1 observation) */
data targets;
    infile f1 dsd dlm=',' missover firstobs=2;
    input target;
    K = 1; * initialise the constant K;
run;

%macro iteration; /* I need to repeat this macro a number of times */
    data myDataSet;
        retain key 1;
        set myDataSet;
        set targets point=key;

        A = INIT_A * K; /* update the value of A /*
        result = min(1, A);
    run;

    /* calculate average result */
    proc sql;
        create table estimate as 
        select avg(result) as estimate0
        from myDataSet;
    quit;

    /* compare estimate0 to target and update K */
    data targets;
        set targets;
        set estimate;

        K = K * (target / estimate0);
    run;
%mend iteration;

I can get the desired answer by running %iteration a few times, but Ideally I would like to run the iteration until (target - estimate0 < 0.01). Is such a thing possible?

Thanks!

2
if you got Target, estimate0 or the target-estimate0 total into a macro variable or variables. Then you could use a %do %until loop within your macro until (%eval) the total was <0.01Jay Corbett

2 Answers

1
votes

I had a similar problem to this just the other day. The below approach is what I used, you will need to change the loop structure from a for loop to a do while loop (or whatever suits your purposes):

First perform an initial scan of the table to figure out your loop termination conditions and get the number of rows in the table:

data read_once;
  set sashelp.class end=eof;

  if eof then do;
    call symput('number_of_obs', cats(_n_) );    
    call symput('number_of_times_to_loop', cats(3) );
  end;
run;

Make sure results are as expected:

%put &=number_of_obs;
%put &=number_of_times_to_loop;

Loop over the source table again multiple times:

data final;

  do loop=1 to &number_of_times_to_loop;
    do row=1 to &number_of_obs;
      set sashelp.class point=row;
      output;
    end;
  end;

  stop; * REQUIRED BECAUSE WE ARE USING POINT=;

run;
0
votes

Two part answer.

First, it's certainly possible to do what you say. There are some examples of code that works like this available online, if you want a working, useful-code example of iterative macros; for example, David Izrael's seminal Rakinge macro, which performs a rimweighting procedure by iterating over a relatively simple process (proc freqs, basically). This is pretty similar to what you're doing. In the process it looks in the datastep at the various termination criteria, and outputs a macro variable that is the total number of criteria met (as each stratification variable separately needs to meet the termination criterion). It then checks %if that criterion is met, and terminates if so.

The core of this is two things. First, you should have a fixed maximum number of iterations, unless you like infinite loops. That number should be larger than the largest reasonable number you should ever need, often by around a factor of two. Second, you need convergence criteria such that you can terminate the loop if they're met.

For example:

data have;
  x=5;
run;

%macro reduce(data=, var=, amount=, target=, iter=20);
   data want;
     set have;
   run;
   %let calc=.;
   %let _i=0;
   %do %until (&calc.=&target. or &_i.=&iter.);
     %let _i = %eval(&_i.+1);
      data want;
        set want;
        &var. = &var. - &amount.;
        call symputx('calc',&var.);
      run;
   %end;
   %if &calc.=&target. %then %do;
     %put &var. reduced to &target. in &_i. iterations.; 
   %end;
   %else %do;
     %put &var. not reduced to &target. in &iter. iterations.  Try a larger number.;
   %end;
%mend reduce;

%reduce(data=have,var=x,amount=1,target=0);

That is a very simple example, but it has all of the same elements. I prefer to use do-until and increment on my own but you can do the opposite also (as %rakinge does). Sadly the macro language doesn't allow for do-by-until like the data step language does. Oh well.


Secondly, you can often do things like this inside a single data step. Even in older versions (9.2 etc.), you can do all of what you ask above in a single data step, though it might look a little clunky. In 9.3+, and particularly 9.4, there are ways to run that proc sql inside the data step and get the result back without waiting for another data step, using RUN_MACRO or DOSUBL and/or the FCMP language. Even something simple, like this:

data have;
  initial_a=0.3;
  a=0.3;
  target=0.5;
  output;
  initial_a=0.6;
  a=0.6;
  output;
  initial_a=0.8;
  a=0.8;
  output;
run;

data want;
  k=1;
  do iter=1 to 20 until (abs(target-estimate0) < 0.001);
    do _n_ = 1 to nobs;
      if _n_=1 then result_tot=0;
      set have nobs=nobs point=_n_;
      a=initial_a*k;
      result=min(1,a);
      result_tot+result;
    end;
    estimate0 = result_tot/nobs;
    k = k * (target/estimate0);
  end;
  output;
  stop;
run;

That does it all in one data step. I'm cheating a bit because I'm writing my own data step iterator, but that's fairly common in this sort of thing, and it is very fast. Macros iterating multiple data steps and proc sql steps will be much slower typically as there is some overhead from each one.