0
votes

I have a dataset that I want to tranpose from long to wide. I have:

  **ID         **Question**        Answer**
    1            Referral to         a 
    1            Referral to         b
    1            Referral to         d
    2            Referral to         a
    2            Referral to         c
    4            Referral to         a  
    6            Referral to         a
    6            Referral to         c
    6            Referral to         d    

What I want the tranposed dataset to look like:

  **ID         **Referral to**        
    1            a, b, d   
    2            a, c
    4            a
    6            a, c, d        

I've tried to transpose the data, but the resulting dataset only contains 1 of the responses from the answer column, not all of them.

Code I've been using:

proc transpose data=test out=test2 let;
by ID;
id Question;
var Answer; run;

The dataset has hundreds of thousands of rows with dozens of variables that are exactly the same as the 'Referral to' example. How can make it so the tranposed wide dataset contains all of the answers to the Question in the same cell and not just one? Any help would be appreciated.

Thank you.

1

1 Answers

0
votes

Here's two methods you can use in this case. The first uses a data step approach, which is a single step. The second is more dynamic and uses a PROC TRANSPOSE + CATX() after the fact to create the combined variable. Note the use of PREFIX option in the transpose procedure to make the variables easier to identify and concatenate.

*create sample data for demonstration;
data have;
    infile cards dlm='09'x;
    input OrgID Product $   States $;
    cards;
1   football    DC
1   football    VA
1   football    MD
2   football    CA
3   football    NV
3   football    CA
;
run;

*Sort - required for both options;
proc sort data=have;
    by orgID;
run;

**********************************************************************;
*Use RETAIN and BY group processing to combine the information;
**********************************************************************;
data want_option1;
    set have;
    by orgID;
    length combined $100.;
    retain combined;

    if first.orgID then
        combined=states;
    else
        combined=catx(', ', combined, states);

    if last.orgID then
        output;
run;

**********************************************************************;
*Transpose it to a wide format and then combine into a single field;
**********************************************************************;
proc transpose data=have out=wide prefix=state_;
    by orgID;
    var states;
run;

data want_option2;
    set wide;
    length combined $100.;
    combined=catx(', ', of state_:);
run;