SAS proc iml use to impute the missing sales from the first or second month based on the average of the next two observations

Question

I'm looking a way to impute using proc iml in sas the average of the sales of the next two months. As you can see sometimes I dont have the sales of 201901 and sometimes is missing on 201902 For example for the first barcode I want to find the sales[1]= mean(sales[2],sales[3]) and I want to do this for each unique barcode.

The "table A" is like this:

Obs. |  Barcode  |    date   |   sales   | Position
---------------------------------------------------------------
1    |21220000000|  201901   |     .     |   1

2   |21220000000|   201902|      311     |   2

3   |21220000000|   201903|      349     |   3

4   |21220000000|   201904|      360     |   4

5   |21220000000|   201905|      380     |   5

6   |21220000000|   201906|      440     |   6

7   |21220000000|   201907|      360     |   7

8   |21220000000|   201908|      390     |   8

9   |21220000000|   201909|      410     |   9

10  |21220000000|   201910|      520     |  10

11  |21220000000|   201911|      410     |  11

12  |21220000000|   201912|      390     |  12

13  |31350000000|   201901|      360     |   1

14  |31350000000|   201902|      .       |   2


                   .etc.
24  |31350000000|   201912|       .      |   12

25  |45480000000|   201901|      310     |   1     

26  |45480000000|   201902|        .     |   2

                   .etc.

I tried something like this but it doesnt work:

proc iml;
t_a= TableCreateFromDataSet("work","table_a");
call TablePrint(t_a); 

do i =1 to nrow(t_a);
  
      if t_a[i,4]=. and t_a[i,5]=1 then t_a[1,4]= mean(t_a[i+1,4],t_a[i+2,4]) ;
      
   i=i+1;
end;
run;

Is there a way to do it using matrices or lists in proc iml or would you recommend any other ways? Thank you in advance!

No! it is just a way I thought it. Maybe it will be easier using lag() but I wanted to avoid the extra columns. — Student1212

Rick Rick · Accepted Answer · 2021-01-14T18:36:01

This problem only involves an ID variable (='BarCode') and a variable that has missing values (='Sales'), so you really only need to read and process two vectors.

An efficient approach is to iterate over the unique levels of the "Barcode" variable (an ID variable) and process each missing value. Thus you can reduce the problem to a "BY group analysis" in which each ID value is processed in turn. There are several ways to perform a BY-group analysis in IML. The easiest to understand and implement is the UNIQUE-LOC technique. For large data, the UNIQUEBY technique is more efficient.

The following example uses the UNIQUE-LOC technique:

proc iml;
use table_a;
   read all var {"BarCode"} into ID;
   read all var {"Sales"} into X;
close;

imputeX = X;             /* make copy of X */
u = unique(ID);          /* unique categories of the ID variable */
do i = 1 to ncol(u);     /* for each ID level */
   groupIdx = loc(ID=u[i]);
   y = x[groupIdx];      /* get the values for this level */
   k = loc( y=. );       /* which are missing? */
   if ncol(k)>0 then do; /* if some are missing, do imputation */
      n = nrow(y);
      startIdx = ((k+1) >< n);  /* starting location, don't exceed n */
      stopIdx  = ((k+2) >< n);  /* ending location, don't exceed n */
      values = y[ startIdx ] || y[ stopIdx ];
      mean = values[ , :];      /* find mean of each row */
      y[k] = mean;              /* copy mean to missing values */
      imputeX[groupIdx] = y;    /* update imputed vector (optional: write data) */
   end;
end;

print ID[F=Z11.] X imputeX;

SAS proc iml use to impute the missing sales from the first or second month based on the average of the next two observations

2 Answers