1
votes

I have a large SAS dataset that includes patient ID and race. This is a longitudinal dataset where each observation represents a visit to the hospital. There are many observations that are missing race information, but other visits for that same patient ID have race indicated. I used the code below to resolve any observations for a given patient ID that was missing race, as long as another visit had that information:

data need;
   do until (last.id);
      set have;
      by id;
      if not missing(race) then newrace=race;
      if missing(race) then race=newrace;
      output;
   end;
run;

My question is - how do I record when a patient has multiple race's indicated across options? How do I determine one to be more dominant/overriding than the others (i.e. for Patient 342, there are 3 obs with race=2 and 2 obs with race=4; we want any indication of race=4 to determine that newrace=4 for all obs with Patient 342).

Thanks!

2
I'm a little confused, what is the logic for picking race=4 in your example? Do you always pick the one with the highest value? Once I know that, a solution will be straight forward.DomPazz
In the patient database, race=(1,2,3,4,5) and race=4 for Hispanic. In the population we are looking at, any indication of Hispanic (even if other observations for the same patient indicate another race) overrides other racial indications. Another example of when this occurs is if a patient is multi-racial and may indicate race=2 at first visit, race=4 at second visit and race=5 at third visit. Overall, I want to be able to recode all race for this patient to =4.user3566853

2 Answers

0
votes

The way I would do it is to create a format for the patient IDs. This not only solves your immediate problem, but might be useful in other steps as it can be used in procs.

data for_fmt;
set have;
by id;
retain label;
retain fmtname 'IDRACEF';
start=id;
if race=4 then label=4; *or you could have label='Hispanic', also - could use this to convert to character strings;
else label=coalesce(label,race); *otherwise only change race if label is missing;
if last.id then output;
keep start label fmtname hlo;
if _n_=1 then do;
 start=.;
 label=.; *or 'MISSING' or something else indicating a nonmatch;
 hlo='o';
 output;
end;
run;

proc format cntlin=for_fmt;
quit;

Then you can use IDRACEF. as a format, either with format on the column (such as in a proc means), or with a put statement.

0
votes

This answer assumes that 4 is the overriding race, and that, if an id has more than one race but one of them is 4, all race values are replaced with 4. Also if a given id has more than one race none of which is 4, this code essentially picks which one will replace missing values at random.

data races (drop=race);
do until (last.id);
set have;
by id;
if newrace ne 4 then newrace = race;
end;
output;
run;

data need (drop=newrace);
merge have races;
by id;
if missing(race) then race=newrace;
if newrace = 4 then race = 4;
run;

The first part creates a data set "races" for which race should replace missings for each id. The second merges that into the original set and replaces missings with the race from "races".