1
votes

This is a follow-up to an earlier question of mine. Transposing Comma-delimited field

The answer I got worked for the specific case, but now I have a much larger dataset, so reading it in a datalines statement is not an option. I have a dataset similar to the one created by this process:

data MAIN;
    input ID STATUS STATE $;
cards;
123 7 AL,NC,SC,NY
456 6 AL,NC
789 7 ALL
;
run;

There are two problems here: 1: I need a separate row for each state in the STATE column 2: Notice the third observation says 'ALL'. I need to replace that with a list of the specific states, which I can get from a separate dataset (below).

data STATES;
    input STATE $;
cards;
AL
NC
SC
NY
TX
;
run;

So, here is the process I am attempting that doesn't seem to be working. First, I create a list of the STATES needed for the imputation, and a count of said states.

proc sql;
    select distinct STATE into :all_states separated by ','
    from STATES;
    select  count(distinct STATE) into :count_states
    from STATES;
quit;

Second, I try to impute that list where the 'ALL' value appears for STATE. This is where the first error appears. How can I ensure that the variable STATE is long enough for the new value? Also, how do I handle the commas?

data x_MAIN;
    set MAIN;
    if STATE='ALL' then STATE="&all_states.";
run;

Finally, I use a SCAN function to read in one state at a time. I'm also getting an error here, but I think fixing the above part may solve it.

data x_MAIN_mod;
    set x_MAIN;
    array state(&count_states.) state:;
    do i=1 to dim(state);
        state(i) = scan(STATE,i,',');
    end; 
run;

Thanks in advance for the help!

2

2 Answers

3
votes

Looks like you are almost there. Try this on the last Data Step.

data x_MAIN_mod;
    set x_MAIN;
    format out_state $2.;
    nstate = countw(state,",");
    do i=1 to nstate;
       out_state = scan(state,i,",");
       output;
    end; 
run;
2
votes

Do you have to actually have two steps like that? You can use a 'big number' in a temporary variable and not have much effect on things, if you don't have the intermediate dataset.

data x_MAIN;
    length state_temp $150;
    set MAIN;
    if STATE='ALL' then STATE_temp="&all_states.";
    else STATE_temp=STATE;
    array state(&count_states.) state:;
    do i=1 to dim(state);
        state(i) = scan(STATE,i,',');
    end; 
    drop STATE_temp;
run;

If you actually do need the STATE, then honestly I'd go with the big number (=50*3, so not all that big) and then add OPTIONS COMPRESS=CHAR; which will (give or take) turn your CHAR fields into VARCHAR (at the cost of a tiny bit of CPU time, but usually far less than the disk read/write time saved).