I have a data set given in CSV format as follows:
Policy_Number,var1,var2,var3,Exposure
1,B,H,J,191
2,B,F,Unknown,174
3,C,Unknown,I,153
4,B,G,L,192
5,Unknown,E,Unknown,184
6,D,E,K,113
7,C,Unknown,I,140
8,A,H,I,133
9,C,F,I,194
10,Unknown,G,Unknown,105
11,B,H,L,172
12,A,Unknown,I,198
13,D,E,K,155
14,Unknown,G,K,177
15,B,H,Unknown,100
16,D,Unknown,J,176
17,B,E,I,112
18,Unknown,E,J,192
19,C,Unknown,K,146
20,C,G,Unknown,187
i want to find out the highest and Second highest 'Exposure' values for each Variable that is Var1, Var2, Var3.
subsequently the output file must look like:
VariableName VariableValue Max_Exposure Exposure_Percentage
Var1 A 198
Var1 A 133
Var1 B 192
Var1 B 191
Var1 C 194
Var1 C 187
Var1 D 176
Var1 D 155
Var1 Unknown 192
Var1 Unknown 184
Var2 E 192
Var2 E 184
Var2 F 194
Var2 F 174
Var2 G 192
Var2 G 187
Var2 H 191
Var2 H 172
Var2 Unknown 198
Var2 Unknown 176
Var3 I 198
Var3 I 194
Var3 J 192
Var3 J 191
Var3 K 177
Var3 K 155
Var3 L 192
Var3 L 187
Var3 Unknown 184
Var3 Unknown 194
Where Exposure_Percentage is the percentage of the exposure of each VariableValue.
i am doing it with a very raw approach. i am creating 3 tables and finding out the Maximum and second maximum exposure for each table and then i am merging those 3 tables. but if the VariableName are increased that is more columns are introduced in the data then my approach will fail. the code for the same be like as follows:
data try2;
infile 'complex.csv' dsd dlm = ',' FIRSTOBS = 2;
Length Policy_Number Var1 $ 10 Var2 $ 10 Var3 $ 10 Exposure 3;
input Policy_Number $ Var1 $ Var2 $ Var3 $ Exposure;
run;
proc sort data = try2;
by Var3 descending Exposure;
run;
data need2;
set try2;
by Var1;
if first.var1 then n=0;
n+1;
if n=1 then output;
if n=2 then output;
run;
Please show me possible solution to such problems. Thanks in advance.