I have the following SAS data set:
data mydata;
LENGTH id 4.0 num 4.0 A $ 4. B $ 8. C $ 20.;
input id num A $ B $ C $;
datalines;
1 1 x yy zzzzz
2 1 x yy zzzzz
3 2 xq yyqq zzzzzqqqqq
4 1 x yy zzzzz
5 3 xqw yyqqww zzzzzqqqqqwwwww
6 1 x yy zzzzz
7 4 xqwe yyqqwwee zzzzzqqqqqwwwwweeeee
;
which looks like
mydata
-------------------
id num A B C
1 1 x yy zzzzz
2 1 x yy zzzzz
3 2 xq yyqq zzzzzqqqqq
4 1 x yy zzzzz
5 3 xqw yyqqww zzzzzqqqqqwwwww
6 1 x yy zzzzz
7 4 xqwe yyqqwwee zzzzzqqqqqwwwwweeeee
The problem is that each of the observations where num > 1 actually contains data for multiple "observations" and I would like to split it up using some logic in SAS. Here's an example for what I want to get:
mydatawanted
-------------------
id num A B C
1 1 x yy zzzzz
2 1 x yy zzzzz
3 1 x yy zzzzz
3 1 q qq qqqqq
4 1 x yy zzzzz
5 1 x yy zzzzz
5 1 q qq qqqqq
5 1 w ww wwwww
6 1 x yy zzzzz
7 1 x yy zzzzz
7 1 q qq qqqqq
7 1 w ww wwwww
7 1 e ee eeeee
Basically, if num > 1 I want to take the substring of each variable depending on its length, for each item, and then output those as new observations with num = 1. Here is what I have tried to code so far:
data mydata2(drop=i _:);
set mydata; /*use the data from the original data set */
_temp_id = id; /*create temp variables from the currently read observation */
_temp_num = num;
_temp_A = A;
_temp_B = B;
_temp_C = C;
if (_temp_num > 1) THEN /* if num in current record > 1 then split them up */
do i = 1 to _temp_num;
id = _temp_id; /* keep id the same */
num = 1; /* set num to 1 for each new observation */
A = substr(_temp_A,i,i); /*split the string by 1s */
B = substr(_temp_B,1 + 2 * (i - 1),i * 2); /*split the string by 2s */
C = substr(_temp_C,1 + 5 * (i - 1),i * 5); /*split the string by 5s */
OUTPUT; /* output this new observation with the changes */
end;
else OUTPUT; /* if num == 1 then output without any changes */
run;
However it doesn't work as I wanted it to (I put in some comments to show what I thought was happening at each step). It actually produces the following result:
mydata2
-------------------
id num A B C
1 1 x yy zzzzz
2 1 x yy zzzzz
3 1 x yy zzzzz
3 1 q qq qqqqq
4 1 x yy zzzzz
5 1 x yy zzzzz
5 1 qw qqww qqqqqwwwww
5 1 w ww wwwww
6 1 x yy zzzzz
7 1 x yy zzzzz
7 1 qw qqww qqqqqwwwww
7 1 we wwee wwwwweeeee
7 1 e ee eeeee
This mydata2 result isn't the same as mydatawanted. The lines where num = 1 are fine but when num > 1 the output records are much different from what I want. The total number of records are correct though. I'm not really sure what is happening, since this is the first time I tried any complicated SAS logic like this, but I would appreciate any help in either fixing my code or accomplishing what I want to do using any alternate methods. Thank you!
edit: I fixed a problem with my original input mydata data statement and updated the question.