de-duplicating across SAS variables

Question

I have transposed some data based upon a condition and I would now like to de-duplicate the data across the variables. I would have done this before transposing the data but that would have left me with only one example of each variable where I actually need to some kind of conditional deduplication.

I started with data that looks like this;

data have;
input ID Event level1 $;
datalines;
12345 9008 Activity1
12345 9008 Activity1
12345 9008 Activity1
12345 9008 Activity1
12345 1001 Activity2
12345 1001 Activity2
12345 1002 Activity3
12345 1002 Activity3
12345 6009 Activity4
12345 9009 Activity1
12345 9008 Activity1
12345 1001 Activity2
23145 9008 Activity1
23145 1001 Activity2
23145 1001 Activity2
23145 1001 Activity2
23145 1002 Activity3
23145 6009 Activity4
23145 1003 Activity5
23145 1003 Activity5
23145 1003 Activity5
23145 1004 Activity6
35762 9008 Activity1
35762 1001 Activity2
35762 1002 Activity3
35762 1002 Activity3
35762 6009 Activity4
35762 6009 Activity4
24958 9008 Activity1
24958 1002 Activity3
24958 1002 Activity3
24958 1002 Activity3
24958 9009 Activity1
24958 9009 Activity2
24958 9009 Activity3
;
run;

This data is sorted based upon the ID and a time stamp (not shown here). I want to divide this up into journeys and transpose the data based upon the first 9008 or 9009 code. This I have managed to do by using the following code

data AS.TENMAY_JOUR_TRANS;
set AS.TENMAY_LEVEL1;
if first.EVENT and EVENT = 9008 or EVENT = 9009 then;
proc transpose data=AS.TENMAY_LEVEL1
                out=AS.TENMAY_JOUR_TRANS
                name=Journey;
by ID;
var level1;
run;
run;

This then gives this;

data have;
   input ID Journey $ Var1 Var2 Var3 Var4 Var5 Var6 Var7 Var8 Var9 Var10 Var11 Var12 Var13;
datalines;
12345 level1 Activity1 Activity1 Activity1 Activity1 Activity2 Activity2 Activity3 Activity3 Activity4
12345 level1 Activity1 Activity1 Activity2
23145 level1 Activity1 Activity2 Activity2 Activity2 Activity3 Activity4 Activity5 Activity5 Activity5 Activity6
35762 level1 Activity1 Activity2 Activity3 Activity3 Activity4 Activity4
24958 level1 Activity1 Activity3 Activity3 Activity3
24958 level1 Activity1 Activity2 Activity3
;
run;

however I want to get to this;

data have ;
input ID Journey $ Var1 Var2 Var3 Var4 Var5 Var6 Var7 Var8 Var9 Var10 Var11 Var12 Var13;
datalines;
12345 level1 Activity1 Activity2 Activity3 Activity4
23145 level1 Activity1 Activity2 Activity3 ACtivity4 Activity5 Activity6
35762 level1 Activity1 Activity3 Activity4
24958 level1 Activity1 Activity3
12345 level1 Activity1 Activity2
24958 level1 Activity1 Activity2 Activity3
;
run;

I cant deduplicate the initial data as I require some of the duplication to remain. I only need the level1 variable to be deduplicated between the first new 9009 or 9008 of each ID

I am supposing the best way to do this is through using the 'NODUP' or 'NODUPKEY' commands in a 'PROC SORT' statement and I am exploring options like

data AS.TENMAY_JOUR_TRANS;
    set AS.TENMAY_LEVEL11;
    if first.EVENT and EVENT = Activity0 or EVENT = Activity1 then;
    proc sort data= AS.TENMAY_JOUR_TRANS NODUP;
    proc transpose data=AS2.TENMAY_LEVEL11
                    out=AS2.TENMAY_JOUR_TRANS
                    name=Journey;
    by ID_INTERNET;
    var level1;
    run;
    run;
run;

I am not sure this is the best way to do this and if so, basically I am not sure whether to place the 'PROC SORT' step, inside or outside of the 'PROC TRANSPOSE' step.

Any help much appreciated.

Thanks!

I don't see how you managed to get any results by incorporating a proc step into a data step. SAS just won't allow that... — Dominic Comtois
I've got no idea either. What I typed above gave me those results though :-/ They are also seem to resemble what I expected to get when I wrote it. Bizarre. I have no idea why it works! — Taylrl
Did you check your log? What happened is probably that your data step has been ignored (because it contains errors, such as the then;) and only your proc transpose worked. — Dominic Comtois

Taylrl Taylrl · Accepted Answer · 2016-07-05T10:17:43

I managed to do this by creating a flag equal to 1 for each first entry of each type of Activity and then filtering based upon this before I do the first Transpose. This I did with the following code

data AS2.TENMAY_LOGFLAG;
    set AS2.TENMAY_LEVEL1;
    if level1 = 'Activity' then logflag =1;
run;

I then used EG to filter for 'where logflag = 1' and then did the Transpose

de-duplicating across SAS variables

3 Answers