2
votes

I have a data set with some missing values and I would like to replace those missing values with the following non-missing value OR if the value occurs in the last variable, then with the previous value.

Eg of data that I have:

x   var1 var2 var3 var4
e1   1    2    3    4
e2   .    .    5    7
e3   5    8    .    .
e4   2    3    1    9

Eg of data that I want:

x   var1 var2 var3 var4
e1   1    2    3    4
e2   **5****5**    5    7
e3   5    8    **8**    **8**
e4   2    3    1    9

I have tried the following code:

set have;
array t(*) var1--var4;
do _n_=1 to dim(t);
if t(_n_)=. then t(_n_)=coalesce(of t(*));
end;
run;```

However, this only replaces the missing value with the following one ie, if the missing value occurs in the var4 then it takes the value from var1 of that row (e3) instead of var2 from row e3.
2

2 Answers

1
votes

If I understand, the value from the next row will be brought into the current row, only when var1 is missing, otherwise missing values in the current row are propogations of the value to the left (even when that value itself is from a prior left to right propogation).

The next row retrieval, also known as lead, can be accomplished using a 1:1 reflexive merge with one self advanced by one row using option firstobs=.

data have; input
x& $8. var1 var2 var3 var4; datalines;
e1   1    2    3    4
e2   .    .    5    7
e3   5    8    .    .
e4   2    3    1    9
run;

data want;
  * reflexive 1:1 merge;
  merge
    have
    have(firstobs=2 keep=var1 rename=var1=lead1)
  ;

  if missing(var1) then var1=lead1;

  array v var1-var4;

  do _i_ = 2 to dim(v);
    if missing(v(_i_)) then v(_i_)=v(_i_-1);
  end;

  drop lead:;
run;
0
votes

An intuitive approach is simply to loop through the array until you encounter a missing value. Then loop through the remaining part of the array looking for the next non-missing value. If the missing value occurs at the end (more precisely: with no non-missing values in the remaining part of the array), we will still have missing values at the end of these loops.

We can then do the same procedure in reverse, starting at the end of the array and working our way to the start.

I'd avoid using _n_ as a variable name, as it is an automatic variable in SAS.

data want;
    set have;
    array t(*) var1--var4;
    /*  Following value*/
    do n=1 to dim(t)-1;
        inner=n;
        do while (t(n)=. and inner lt dim(t));
            t(n)=t(inner+1);
            inner+1;
        end;
    end;
    /*  If there was no following value, we still have missing values, and finds previous instead*/
    do n=dim(t) to 2 by -1;
        inner=n;
        do while (t(n)=. and inner gt 0);
            t(n)=t(inner-1);
            inner+ (-1);
        end;
    end;
    drop n inner;
run;