0
votes

I am trying to make some existing values to missing values (not deleting them). Here is the basic structure of my data set.

I want to treat AGE and GENDER as missing whenever A is less than B. For example, when A=1 and B=3, I want to treat values of AGE and GENDER on the last two rows as missing (as shown on the data sets).

In my data both A and B go from 1 to 4 and have every combination of them.

Asterisks mean I have more data between them. Thanks in advance!

BEFORE
    ID A B AGE GENDER
    --------------
    1  1 1 35  M
    *  * * *   *
    *  * * *   *
    5  1 2 23  F
    5  1 2 21  M
    6  1 2 42  F
    6  1 2 43  M
    *  * * *   *
    *  * * *   *
    20 1 3 43  F
    20 1 3 39  M
    20 1 3 23  M
    21 1 3 32  F
    21 1 3 39  M
    21 1 3 23  F
    *  * * *   *
    *  * * *   *
    55 2 4 32  M
    55 2 4 12  M
    55 2 4 31  F
    55 2 4 43  M
    *  * * *   *
    *  * * *   *

AFTER    
     ID A B AGE GENDER
     --------------
     1  1 1 35  M
     *  * * *   *
     *  * * *   *
     5  1 2 23  F
     5  1 2 .   .
     6  1 2 42  F
     6  1 2 .   .
     *  * * *   *
     *  * * *   *
     20 1 3 43  F
     20 1 3 .   .
     20 1 3 .   .
     21 1 3 32  F
     21 1 3 .   .
     21 1 3 .   .
     *  * * *   *
     *  * * *   *
     55 2 4 32  M
     55 2 4 12  M
     55 2 4 .   .
     55 2 4 .   . 
     *  * * *   *
     *  * * *   *
1
I added a variable called ID as well as a few more values to distinguish how observations are groups by variable B.Ken

1 Answers

5
votes

How about now?

data temp;
  retain idcount 0;
  set olddata;

  ** Create an observation counter for each id **;   
  prev_id = lag(id);

  if id ^= prev_id then idcount = 0;
  idcount = idcount + 1;

run;


** Sort the obs by ID in reverse order **; 
proc sort data=temp; 
    by id descending idcount;
run;

data temp2;
    retain misscount 0;
    set temp;
    by id descending idcount;

    ** Keep the previous age and gender **;
    old_age = age;
    old_gender = gender;

    ** Count the number that should be missing **;
    if a < b then nummiss = b - a;
    else nummiss = 0;

    ** Set a counter of obs that we will set to missing **;   
    if first.id then misscount = 0;

    ** Set the appropriate number of rows to missing and update the counter **;
    if misscount < nummiss then do;
       misscount = misscount + 1;
       call missing(age, gender);
    end;
run;

proc sort data=temp2 out=temp3(drop=misscount nummiss idcount prev_id);
by id idcount;
run;