I have transposed some data based upon a condition and I would now like to de-duplicate the data across the variables. I would have done this before transposing the data but that would have left me with only one example of each variable where I actually need to some kind of conditional deduplication.
I started with data that looks like this;
data have;
input ID Event level1 $;
datalines;
12345 9008 Activity1
12345 9008 Activity1
12345 9008 Activity1
12345 9008 Activity1
12345 1001 Activity2
12345 1001 Activity2
12345 1002 Activity3
12345 1002 Activity3
12345 6009 Activity4
12345 9009 Activity1
12345 9008 Activity1
12345 1001 Activity2
23145 9008 Activity1
23145 1001 Activity2
23145 1001 Activity2
23145 1001 Activity2
23145 1002 Activity3
23145 6009 Activity4
23145 1003 Activity5
23145 1003 Activity5
23145 1003 Activity5
23145 1004 Activity6
35762 9008 Activity1
35762 1001 Activity2
35762 1002 Activity3
35762 1002 Activity3
35762 6009 Activity4
35762 6009 Activity4
24958 9008 Activity1
24958 1002 Activity3
24958 1002 Activity3
24958 1002 Activity3
24958 9009 Activity1
24958 9009 Activity2
24958 9009 Activity3
;
run;
This data is sorted based upon the ID and a time stamp (not shown here). I want to divide this up into journeys and transpose the data based upon the first 9008 or 9009 code. This I have managed to do by using the following code
data AS.TENMAY_JOUR_TRANS;
set AS.TENMAY_LEVEL1;
if first.EVENT and EVENT = 9008 or EVENT = 9009 then;
proc transpose data=AS.TENMAY_LEVEL1
out=AS.TENMAY_JOUR_TRANS
name=Journey;
by ID;
var level1;
run;
run;
This then gives this;
data have;
input ID Journey $ Var1 Var2 Var3 Var4 Var5 Var6 Var7 Var8 Var9 Var10 Var11 Var12 Var13;
datalines;
12345 level1 Activity1 Activity1 Activity1 Activity1 Activity2 Activity2 Activity3 Activity3 Activity4
12345 level1 Activity1 Activity1 Activity2
23145 level1 Activity1 Activity2 Activity2 Activity2 Activity3 Activity4 Activity5 Activity5 Activity5 Activity6
35762 level1 Activity1 Activity2 Activity3 Activity3 Activity4 Activity4
24958 level1 Activity1 Activity3 Activity3 Activity3
24958 level1 Activity1 Activity2 Activity3
;
run;
however I want to get to this;
data have ;
input ID Journey $ Var1 Var2 Var3 Var4 Var5 Var6 Var7 Var8 Var9 Var10 Var11 Var12 Var13;
datalines;
12345 level1 Activity1 Activity2 Activity3 Activity4
23145 level1 Activity1 Activity2 Activity3 ACtivity4 Activity5 Activity6
35762 level1 Activity1 Activity3 Activity4
24958 level1 Activity1 Activity3
12345 level1 Activity1 Activity2
24958 level1 Activity1 Activity2 Activity3
;
run;
I cant deduplicate the initial data as I require some of the duplication to remain. I only need the level1 variable to be deduplicated between the first new 9009 or 9008 of each ID
I am supposing the best way to do this is through using the 'NODUP' or 'NODUPKEY' commands in a 'PROC SORT' statement and I am exploring options like
data AS.TENMAY_JOUR_TRANS;
set AS.TENMAY_LEVEL11;
if first.EVENT and EVENT = Activity0 or EVENT = Activity1 then;
proc sort data= AS.TENMAY_JOUR_TRANS NODUP;
proc transpose data=AS2.TENMAY_LEVEL11
out=AS2.TENMAY_JOUR_TRANS
name=Journey;
by ID_INTERNET;
var level1;
run;
run;
run;
I am not sure this is the best way to do this and if so, basically I am not sure whether to place the 'PROC SORT' step, inside or outside of the 'PROC TRANSPOSE' step.
Any help much appreciated.
Thanks!
proc
step into adata
step. SAS just won't allow that... – Dominic Comtoisthen;
) and only your proc transpose worked. – Dominic Comtois