Replacing missing values with closest non-missing value

Question

I have a data set with some missing values and I would like to replace those missing values with the following non-missing value OR if the value occurs in the last variable, then with the previous value.

Eg of data that I have:

x   var1 var2 var3 var4
e1   1    2    3    4
e2   .    .    5    7
e3   5    8    .    .
e4   2    3    1    9

Eg of data that I want:

x   var1 var2 var3 var4
e1   1    2    3    4
e2   **5****5**    5    7
e3   5    8    **8**    **8**
e4   2    3    1    9

I have tried the following code:

set have;
array t(*) var1--var4;
do _n_=1 to dim(t);
if t(_n_)=. then t(_n_)=coalesce(of t(*));
end;
run;```

However, this only replaces the missing value with the following one ie, if the missing value occurs in the var4 then it takes the value from var1 of that row (e3) instead of var2 from row e3.

Richard Richard · Accepted Answer · 2019-09-27T14:16:28

If I understand, the value from the next row will be brought into the current row, only when var1 is missing, otherwise missing values in the current row are propogations of the value to the left (even when that value itself is from a prior left to right propogation).

The next row retrieval, also known as lead, can be accomplished using a 1:1 reflexive merge with one self advanced by one row using option firstobs=.

data have; input
x& $8. var1 var2 var3 var4; datalines;
e1   1    2    3    4
e2   .    .    5    7
e3   5    8    .    .
e4   2    3    1    9
run;

data want;
  * reflexive 1:1 merge;
  merge
    have
    have(firstobs=2 keep=var1 rename=var1=lead1)
  ;

  if missing(var1) then var1=lead1;

  array v var1-var4;

  do _i_ = 2 to dim(v);
    if missing(v(_i_)) then v(_i_)=v(_i_-1);
  end;

  drop lead:;
run;

Replacing missing values with closest non-missing value

2 Answers