1
votes

I need to create some cycle in SAS proc-sql which divide data into groups. I have data

ID1     ID2 TIME                    GROUP
1234    12  22MAY2015:16:10:00.000  0
1234    12  22MAY2015:16:15:00.000  0
1234    12  12JUN2015:6:35:00.000   0
1234    12  12JUN2015:16:35:00.000  0
6549    45  15APR2015:16:10:00.000  0
6549    45  18APR2015:13:15:00.000  0
6549    45  18APR2015:13:18:00.000  0
6549    15  22MAY2015:14:15:00.000  0
6549    15  22MAY2015:14:20:00.000  0

and I need create new column GROUP where will be the same id for those rows which have same ID1, same ID2 and difference between TIME is max 10 minutes.

Result will be:

ID1     ID2 TIME                    GROUP
1234    12  22MAY2015:16:10:00.000  1
1234    12  22MAY2015:16:15:00.000  1
1234    12  12JUN2015:6:35:00.000   2
1234    12  12JUN2015:16:35:00.000  3
6549    45  15APR2015:16:10:00.000  4
6549    45  18APR2015:13:15:00.000  5
6549    45  18APR2015:13:18:00.000  5
6549    15  22MAY2015:14:15:00.000  6
6549    15  22MAY2015:14:20:00.000  6

I have tried to write some 'do while' cycle but it does not work.

data b;
set a;
time1 = time;
id1_1 = id1;
id2_1 = id2;
time2 = time;
id1_2 = id1;
id2_2 = id2;
group = group+1;
do while (id1_1 eq id1_2 id2_1 eq id2_2 floor((time2-time1)/60)<=10);
group = group;
time2 = time;
id1_2 = id1;
id2_2 = id2;
end;
run;

Thank you very much.

1
Welcome to SO, you will get a better response if you post up any error messages you saw, or you explain why the code you've written doesn't work as you expected.mjsqu
Hi, this code is running with no output. It seems to me that it is looped. When I change it to until it receive some result but in a group column is the same number for every row. I do not know this method is good. I have no experiences with loops so I think there is better way or different way how to do it.Vendula

1 Answers

2
votes

Proc SQL is not a suitable tool for your problem. You have thought about 'Looping', which is a good start. However, the missing element in your code is the 'lagging' concept along with some other details. Your grouping condition has two parts: 1) Based on the natural group of ID1, ID2 2) On top of 1), additional group is generated if TIME is more than 10 minutes apart.

data have;
    input (ID1     ID2) (:$8.) TIME:datetime23.;
    format TIME:datetime23.;
    cards;
1234    12  22MAY2015:16:10:00.000  1
1234    12  22MAY2015:16:15:00.000  1
1234    12  12JUN2015:6:35:00.000   2
1234    12  12JUN2015:16:35:00.000  3
6549    45  15APR2015:16:10:00.000  4
6549    45  18APR2015:13:15:00.000  5
6549    45  18APR2015:13:18:00.000  5
6549    15  22MAY2015:14:15:00.000  6
6549    15  22MAY2015:14:20:00.000  6
;

data want;
    group+1; /*condition part 1*/

    do until (last.id2);
        set have;
        by id1 id2 notsorted;
        lag_time=lag(time);

        if not first.id2 then group+intck('minute',lag_time,time)>10; /*condition part 2*/
            output;
    end;

    drop lag_time;
run;