1
votes

I have a SAS dataset (change_detail) that maintains change variables and looks like this (except there are over 200 change variables in the actual table):

Reference Chg1Code Chg1Desc Chg2Code Chg2Desc Chg3Code Chg3Desc
--------- -------- -------- -------- -------- -------- --------
1234                        C        Change2
1234      C        Change1  C        Change2
1234      C        Change1                    C        Change3
2345      C        Change1                    C        Change3
2345                                          C        Change3
3456                        C        Change2  C        Change3
3456      C        Change1  C        Change2
3456      C        Change1
3456      C        Change1                    C        Change3  

I need to distill this information into a summary of the changes that have occurred to each reference number. When I used the following PROC SQL in SAS:

proc sql;
create table work.changedata_summary as 
select distinct 
      a.Reference, 
      catx(',',a.Chg1Desc,a.Chg2Desc,a.Chg3Desc) as Changes length=1000
from
      work.change_detail a
;
quit;

It concatenates the individual Change Description variables, but maintains the original number of observations per Reference Number. The results I get are:

Reference Changes
--------- -------
1234      Change2
1234      Change1,Change2
1234      Change1,Change3
2345      Change1,Change3
2345      Change3
3456      Change2,Change3
3456      Change1,Change2
3456      Change1
3456      Change1,Change3

I need some help in flattening this data out (using either PROC SQL or a DATA step) to show just the summary results by distinct Reference number. The expected result is:

Reference Changes
--------- -----------------------
1234      Change1,Change2,Change3
2345      Change1,Change3
3456      Change1,Change2,Change3

Any help provided will be appreciated! Thanks!

3
Can you have only 3 changes or is this just a small sample of your data?Reeza
This is just a small sample. The actual dataset contains change data on over 200 different variables, so I need a scalable solution.Brian S
Why SQL? Would be much easier in a data step.Tom

3 Answers

0
votes

if your data is something like this then adding a group by and max to your code should work

  proc sql;
  create table work.changedata_summary as 
  select  
        Reference, 
  catx(',',max(chg1Desc),max(Chg2Desc), max(Chg3Desc)) as Changes length=1000
  from
  have 
  group by Reference
  ;
  quit;
0
votes

I wouldn't even try to do such a thing in PROC SQL. If you just want to see the last non-empty change for each variable then use a data step with an UPDATE statement to collapse the data to one observation per group.

data want ;
  update have (obs=0) have ;
  by reference ;
  length changes $500 ;
  changes=catx(',',of change1-change3);
run;
0
votes

I would use Data step for this kind of operation

Try this out

data want;

length change_tot $200.;

   do until (last.Reference);

      set have;

        by Reference notsorted;

      change_tot=catx(',',change_tot,Chg1Desc, Chg2Desc, Chg3Desc);

   end;

   Keep reference change_tot;

run;