0
votes

*****i have 4 million observations. i have a variable X which has 100 unique observations with 5 digit codes assigned to it(e.x. 10420, 49660, 52739,65231,etc.) and arranged in ascending order. in each of these unique X's, i have variable Y which has codes from 1-4. now i need to calculate weighted average by each codes of Y and also weighted average by each codes of Y across X. So there are two stages: inner stage is to calculate weighted average by each codes of Y for each X's and outer stage is to calculate weighted average by each codes of Y for overall X**.***

X      wrkallmt  medtt_all  Y
-----  --------  ---------  --
10420  15        25.4       1
10420  50        5.7        4
10420  70        5.9        3
10420  110       5.9        4
10420  110       15         2
12630  30        15.8       4
12630  15        10.9       2
12630  85        10.5       2
12630  20        20.7       1
12630  20        23.5       3
13560  20        20.3       3
13560  20        20.8       3
13560  30        20.9       4
13560  20        25.5       2
13560  25        30.6       4
13560  20        45.3       1
13560  20        40.4       4
13560  20        30.9       1
13560  10        10.8       2

inner stage: for each X's suppose x = 10420 weighted average for code 1 is (15*25.4)/15, code 4 is ((50*5.7)+(110*5.9))/(50+110). outer stage: weighted average for code 1 is (weithtedavg of code 1 in 10420 + weightedavg of code 1 in 12630 + weightedavg of code 1 in 13560)/(sum of wrkallmt in code 1 across x)


I can only think of using nested DO-LOOP. i figured out the inner stage but cannot construct the outer stage:

data WORK.exp3;
   SET WORK.ABC;
   do X = ??;
      A = 0;
      B = 0;
      wgtavg_all = 0;
      do Y = 1 to 4;
         A = sum(wrkallmt*medtt_all);
         B = sum(wrkallmt);
         wgtavg_all = A/B;
         output;
      end;
      output;
   end;
run; 

I'm using SAS 9.3. Can i use proc sql using group by? but in that case i need to use group by two times to get the desired result.

1
Can you give a few rows of example data and the desired result?Brian DeMilia
Take a look at PROC SUMMARY you can use the _TYPE_ variable in the output dataset to choose the levels of summarisation. If you have class X Y; then the summarisation will cover 4 levels: 0 - Overall (whole dataset), 1 - just Y, 2 - just X and 3 - stats for X and Y. The number is the value of _TYPE_.mjsqu
@ brian demilia : i have added few rows of example and the desired result. let me know if that helpsuser3393041

1 Answers

0
votes

There is certainly more than one way to do this, but see if this is close to what you want. This requires 2 sorts, but takes advantage of the BY statement (and the LAST temp variable) in the DATA STEP.

data have;
 length x y 8.;
 Input X wrkallmt medtt_all Y;
 datalines;
10420 15 25.4 1
10420 50 5.7 4
10420 70 5.9 3
10420 110 5.9 4
10420 110 15 2
12630 30 15.8 4
12630 15 10.9 2
12630 85 10.5 2
12630 20 20.7 1
12630 20 23.5 3
13560 20 20.3 3
13560 20 20.8 3
13560 30 20.9 4
13560 20 25.5 2
13560 25 30.6 4
13560 20 45.3 1
13560 20 40.4 4
13560 20 30.9 1
13560 10 10.8 2
;

proc sort data=have out=sortedx;
 by x y;
run;

Data need1(keep=x y weighted_avg_combined_y);
 retain a b 0;
 set sortedx;
 by x y;

 a = a + (wrkallmt * medtt_all);
 b = b + wrkallmt;

 if last.y then do;
  weighted_avg_combined_y=a/b;
  output ;
 end;

run;

Proc sort data=have out=sortedy;
 by y x;
run;

Data need2(keep=x y weighted_avg_all_y);
 retain a b 0;
 set sortedy;
 by y x;

 a = a + (wrkallmt * medtt_all);
 b = b + wrkallmt;

 if last.y then do;
  weighted_avg_all_y=a/b;
  output ;
 end;

run;