0
votes

Following the question asked about throwing out the trimmed mean of the proc univariate in a table :
SAS: PROC UNIVARIATE: Output trimmed mean to dataset

I would like to output a trimmed mean from a proc univariate by group. However the ods output does not seem to work with noprint and there are just too many group id for it to work out . Any sidestep to this problem?

proc univariate data = Table1  idout trim=1;
var DaysBtwPay;
by id;
trimmedmeans = trimMean2 (keep = id Mean stdMean);
run;
2

2 Answers

1
votes

I can't think of anyway around this issue other than to write your own data step to calculate trimmed mean. This can be done in 2 steps.

Step-1:

In this step we would want to know how many observations are there in each by-group and the simple average of your measurement variable. In the next step, simple average will be returned when calculating trimmed mean is infeasible. For example: If you want to exclude the extreme 5 obs but you only 7 observations in a by-group then proc univarite returns a missing value. Notice the order by clause - this ordering is used in excluding extreme obs.

proc sql;
create table inputForTmeans as 
select
a.region /*your by-group var*/
,a.returns /*your measurement variable of interest*/
,b.count
,b.simpleAvgReturns
from sashelp.shoes as a
inner join (select region, count(*) as count, mean(returns) as simpleAvgReturns
                   from sashelp.shoes 
                    group by region) as b
on a.region = b.region
order by 
a.region
,a.returns;
quit;

Step-2:

%let trimmed = 1; /*no. of extreme obs to exclude from mean calculation*/
data trimmedMean;
set inputForTmeans;
row_count+1; /*counter variable to number each obs in a by-group*/
by region returns;
if first.region then do;
                row_count=1;
                returnsSum=.;
                end;
if &trimmed.<row_count <=(count - &trimmed.) then returnsSum+returns;
/***************************************************************************/
if last.region then do;
        trimmedMeanreturns = coalesce(returnsSum/(count - 2*&trimmed.), simpleAvgReturns) ;
        N = row_count;
        trimmedRowCount = 2*&trimmed.;
        output;
    end;
keep region trimmedMeanreturns N count trimmedRowCount ;
/***************************************************************************/
run;

Output: %let trimmed = 1;

region                     DataStep      ProcUnivariate
Africa                     1183.962963   1183.963
Asia                       662           662
Canada                     3089.1428571  3089.143
Central America/Caribbean  3561.1333333  3561.133
Eastern Europe             2665.137931   2665.138
Middle East                6794.3181818  6794.318
Pacific                    1538.5813953  1538.581
South America              1824.1153846  1824.115
United States              4462.4210526  4462.421
Western Europe             2538.05       2538.05

%let trimmed = 14;

region                    DataStep     ProcUnivariate
Africa                    897.92857143 897.9
Asia                      778.21428571 .
Canada                    1098.1111111 1098.1
Central America/Caribbean 2289.25      2289.3
Eastern Europe            2559.6666667 2559.7
Middle East               8620         .
Pacific                   895.88235294 895.9
South America             1538.5769231 1538.6
United States             4010.4166667 4010.4
Western Europe           1968.5882353  1968.6

Output from the datastep for trimmed=1:

Count: No. of rows in the by-group

N: Ignore this column - same as Count

trimmedRowCount: no. of extreme rows excluded. If trimmedRowCount = Count then trimmedMeanreturns is the SimpleAverage

Region                    count trimmedMeanreturns N trimmedRowCount
Africa                    56    1183.963           56     2
Asia                      14    662                14     2
Canada                    37    3089.143           37     2
Central America/Caribbean 32    3561.133           32     2
Eastern Europe            31    2665.138           31     2
Middle East               24    6794.318           24     2
Pacific                   45    1538.581           45     2
South America             54    1824.115           54     2
United States             40    4462.421           40     2
Western Europe            62    2538.05            62     2
1
votes

Doh!, it appears you can use ods _all_ close; option to suppress the HTML output instead of going through the trouble of writing your own datastep routine.

%let trimmed = 1; 
proc sort data=sashelp.shoes out=have;
by region;
run;
ods _all_ close;
PROC UNIVARIATE DATA=have trimmed=&trimmed. ;
VAR returns;
by region;
ods output TrimmedMeans=trimmedMeansUni  ;
run;