How to retain original value of variable not in BY group when using UPDATE to collapse observations

Question

I have a list of people with medical visits on specific dates, and I've already flagged them for specific diagnoses. Now I'm trying to collapse these flags to create a file with one person/one date per line. I've found a solution that partially works, using UPDATE (found here: SAS collapse dates). However this overwrites the flagged diagnosis code with the last diagnosis code for a person on a specific date. Here is a simplified version of my data:

data have;
input id id_date diag_code $ flag;
datalines;
1 1 a .
1 1 b 1
1 1 c .
1 2 d 1
1 2 e .
1 2 f 1
2 1 g .
2 1 h .
2 1 i 1
2 2 j 1
3 1 k . 
;
run;

data want;
    update have (obs=0) have;
    by id id_date;
run;

Output:

                        diag_
Obs    id    id_date    code     flag

 1      1       1         c        1  
 2      1       2         f        1  
 3      2       1         i        1  
 4      2       2         j        1 
 5      3       1         k        .

What I'm trying to get is:

                        diag_
Obs    id    id_date    code     flag

 1      1       1         b        1  
 2      1       2         d        1  
 3      2       1         i        1  
 4      2       2         j        1 
 5      3       1         k        .

So basically, I'd like to keep the diag_code from the first observation that has flag=1. I've tried the RENAME option to prevent overwriting the variable, but since UPDATE first reads in a dataset with 0 observations, the (new) original variable shows up but is empty:

data want;
    update have (obs=0 rename=(diag_code=orig_diag_code)) have;
    by id id_date;
run;

                        orig_
                        diag_              diag_
Obs    id    id_date    code     flag      code

 1      1       1                  1       c  
 2      1       2                  1       f  
 3      2       1                  1       i  
 4      2       2                  1       j
 5      3       1                  .       k

Any ideas?

Edit: Adding where flag = 1 still erroneously shows the last diagnosis for when multiple flags occur, and does not produce observations on dates when the flag is missing:

data want;
    update have (obs=0) have;
    by id id_date;
    where flag=1;
run;

                        diag_
Obs    id    id_date    code     flag

 1      1       1         b        1  
 2      1       2         f        1  
 3      2       1         i        1  
 4      2       2         j        1

Is this not just an issue of using a (where = (flag = 1)) statement? — AdamO
@AdamO I suspect that would work for this use case, but not generalize, OP can you please confirm? Would filter on flag=1 be a solution? If not, can you expand your example data to include that scenario? — Reeza
Just tried it out. It works only if there aren't multiple observations with flag =1. The diag_code variable then shows b / f / i / j, since both d and f are flag = 1, and f is the last one. Also if there are observations without flags those won't be listed, and that also occurs in my dataset (but not listed here in this example). — HHJ
Edited question to include partial answer by @AdamO, and included observation in example that doesn't have a flag — HHJ

Richard Richard · Accepted Answer · 2019-07-13T12:37:25

The first row in a group meeting some criteria can be selected using flag (or state maintenance) variables.

Here are two ways:

DOW loop over group, does not require retain or first.
Only implicit loop with retain, first. and last.

Code:

* DOW over group with last., no retain or first.;
data want;
  do _n_ = 1 by 1 until (last.id_date);
    set have;
    by id id_date;
    if flag and not flagged then do;
      output;
      flagged = _n_;
    end;
  end;
  if not flagged then output;

  drop flagged;
run;

* Only implicit loop with retain, first. and last.;
data want;
  retain flagged;
  drop flagged;

  set have;
  by id id_date;

  if first.id_date then flagged = .;

  if flag and not flagged then do;
    flagged = 1;
    output;
  end;

  if last.id_date and not flagged then
    output;
run;

How to retain original value of variable not in BY group when using UPDATE to collapse observations

2 Answers