*****i have 4 million observations. i have a variable X which has 100 unique observations with 5 digit codes assigned to it(e.x. 10420, 49660, 52739,65231,etc.) and arranged in ascending order. in each of these unique X's, i have variable Y which has codes from 1-4. now i need to calculate weighted average by each codes of Y and also weighted average by each codes of Y across X. So there are two stages: inner stage is to calculate weighted average by each codes of Y for each X's and outer stage is to calculate weighted average by each codes of Y for overall X**.***
X wrkallmt medtt_all Y
----- -------- --------- --
10420 15 25.4 1
10420 50 5.7 4
10420 70 5.9 3
10420 110 5.9 4
10420 110 15 2
12630 30 15.8 4
12630 15 10.9 2
12630 85 10.5 2
12630 20 20.7 1
12630 20 23.5 3
13560 20 20.3 3
13560 20 20.8 3
13560 30 20.9 4
13560 20 25.5 2
13560 25 30.6 4
13560 20 45.3 1
13560 20 40.4 4
13560 20 30.9 1
13560 10 10.8 2
inner stage: for each X's suppose x = 10420
weighted average for code 1 is (15*25.4)/15
, code 4 is ((50*5.7)+(110*5.9))/(50+110)
.
outer stage: weighted average for code 1 is (weithtedavg of code 1 in 10420 + weightedavg of code 1 in 12630 + weightedavg of code 1 in 13560)/(sum of wrkallmt in code 1 across x)
I can only think of using nested DO-LOOP. i figured out the inner stage but cannot construct the outer stage:
data WORK.exp3;
SET WORK.ABC;
do X = ??;
A = 0;
B = 0;
wgtavg_all = 0;
do Y = 1 to 4;
A = sum(wrkallmt*medtt_all);
B = sum(wrkallmt);
wgtavg_all = A/B;
output;
end;
output;
end;
run;
I'm using SAS 9.3. Can i use proc sql using group by? but in that case i need to use group by two times to get the desired result.
PROC SUMMARY
you can use the_TYPE_
variable in the output dataset to choose the levels of summarisation. If you haveclass X Y;
then the summarisation will cover 4 levels: 0 - Overall (whole dataset), 1 - justY
, 2 - justX
and 3 - stats forX
andY
. The number is the value of_TYPE_
. – mjsqu