1
votes

I was wondering, if anybody has a quick and easy way to collapse transactional data into one observation for easier modelling processing.

For example, let's say we look at a negotiation with a customer, every record is a quote for a certain car model with options A, B and C (all nominal indicators). The last record indicates a sale.

DATA TEMPSET; 
    INPUT  CUST_ID $ A $ B $ C $;
    DATALINES; 
        01  1   0   3
        01  1   1   0
        01  1   1   3
        01  0   1   3
        02  0   0   2
        02  1   0   2
        02  1   1   2
        02  1   2   2
        02  0   2   2
        ;
RUN; 

To make things easier I would love to have one the resulting dataset to look like:

CUST_ID A   B   C   A-1 B-1 C-1 A-2 B-2 C-2 A-3 B-3 C-3 A-4 B-4 C-4
01      1   0   3   1   1   0   1   1   3   0   1   3    .  .   .
02      0   0   2   1   0   2   1   1   2   1   2   2   0   2   2

My approach was a two dimensional array to create the variables. But then I could not combine it with a DO loop, trying to assign assign the values since it has multiple obs. I also tried using macro variables with SYMPUT/SYMGET and then LAST.CUst_ID = 1 to trigger the output, still with the problems of not having always the same length of quote history as well as requireing a hardcoding for each variable, which is practical for three variables, but not with the number increases. Any suggestions are welcome, probably possible with PROC SQL in a much simpler fashion?

Thanks!

1
Take a look into PROC TRANSPOSEEvan Volgas

1 Answers

1
votes

PROC TRANSPOSE is your friend here. Make a vertical dataset with the wanted name and value, and transpose.

DATA TEMPSET; 
    INPUT  CUST_ID $ A $ B $ C $;
    DATALINES; 
        01  1   0   3
        01  1   1   0
        01  1   1   3
        01  0   1   3
        02  0   0   2
        02  1   0   2
        02  1   1   2
        02  1   2   2
        02  0   2   2
        ;
RUN; 

data tempset_i;
set tempset;
by cust_id;
if first.cust_id then row=0;
row+1;
array vars a b c;
do _i = 1 to dim(vars);
  varname = cats(vname(vars[_i]),row);
  value = vars[_i];
  output;
end;
keep cust_id varname value;
run;

proc transpose data=tempset_i out=tempset_t(drop=_name_);
by cust_ID;
id varname;
var value;
run;