1
votes

I have a dataset which has three variables: Application number, decline code and sequence. Now, there may be multiple decline code for a single application(which will have different sequence number). So the data looks like following:

Application No       Decline Code      Sequence 

1234                      FG              1
1234                      FK              3
1234                      AF              2
1256                      AF              2
1256                      FK              1
. 
. 
. 
. 

And so on

So, I have to put this in wide format such that the first column contains unique application numbers and corresponding to each of them is their decline code(I don't need sequence number, just that decline codes should appear in order of their sequence number from left to right, separated by a comma). Something like below

Application Number            Decline Code
1234                           FG, AF, FK
1256                           FK, AF
.......... 
......... 

And so on

Now I tried ruining proc transpose by application number on SAS. But the problem is that it creates multiple columns with all the decline codes listed and then if a certain decline code doesn't apply for an application, it will show . in that. So their are many missing values and it isn't quite the format I am expecting. Is there any way to do this in SAS or sql?

1

1 Answers

2
votes

PROC TRANSPOSE can certainly help here; then you can CATX the variables together if you really just want one variable:

data have;
input ApplicationNo       DeclineCode  $    Sequence ;
datalines;
1234                      FG              1
1234                      FK              3
1234                      AF              2
1256                      AF              2
1256                      FK              1
;;;;
run;
proc sort data=have;
  by ApplicationNo Sequence;
run;
proc transpose data=have out=want_pre;
  by ApplicationNo;
  var DeclineCode;
run;

data want;
  set want_pre;
  length decline_codes $1024;
  decline_codes = catx(', ',of col:);
  keep ApplicationNo decline_codes;
run;

You could also do this trivially in one datastep, using first and last checks.

data want_ds;
  set have;
  by ApplicationNo Sequence;
  retain decline_codes;
  length decline_codes $1024; *or whatever you need;
  if first.ApplicationNo then call missing(decline_codes);
  decline_codes = catx(',',decline_codes, DeclineCode);
  if last.ApplicationNo then output;
run;