I can't think of anyway around this issue other than to write your own data step to calculate trimmed mean
.
This can be done in 2 steps.
Step-1:
In this step we would want to know how many observations are there in each by-group
and the simple average of your measurement variable. In the next step, simple average will be returned when calculating trimmed mean
is infeasible. For example: If you want to exclude the extreme 5 obs but you only 7 observations in a by-group
then proc univarite
returns a missing value.
Notice the order by
clause - this ordering is used in excluding extreme obs.
proc sql;
create table inputForTmeans as
select
a.region /*your by-group var*/
,a.returns /*your measurement variable of interest*/
,b.count
,b.simpleAvgReturns
from sashelp.shoes as a
inner join (select region, count(*) as count, mean(returns) as simpleAvgReturns
from sashelp.shoes
group by region) as b
on a.region = b.region
order by
a.region
,a.returns;
quit;
Step-2:
%let trimmed = 1; /*no. of extreme obs to exclude from mean calculation*/
data trimmedMean;
set inputForTmeans;
row_count+1; /*counter variable to number each obs in a by-group*/
by region returns;
if first.region then do;
row_count=1;
returnsSum=.;
end;
if &trimmed.<row_count <=(count - &trimmed.) then returnsSum+returns;
/***************************************************************************/
if last.region then do;
trimmedMeanreturns = coalesce(returnsSum/(count - 2*&trimmed.), simpleAvgReturns) ;
N = row_count;
trimmedRowCount = 2*&trimmed.;
output;
end;
keep region trimmedMeanreturns N count trimmedRowCount ;
/***************************************************************************/
run;
Output:
%let trimmed = 1;
region DataStep ProcUnivariate
Africa 1183.962963 1183.963
Asia 662 662
Canada 3089.1428571 3089.143
Central America/Caribbean 3561.1333333 3561.133
Eastern Europe 2665.137931 2665.138
Middle East 6794.3181818 6794.318
Pacific 1538.5813953 1538.581
South America 1824.1153846 1824.115
United States 4462.4210526 4462.421
Western Europe 2538.05 2538.05
%let trimmed = 14;
region DataStep ProcUnivariate
Africa 897.92857143 897.9
Asia 778.21428571 .
Canada 1098.1111111 1098.1
Central America/Caribbean 2289.25 2289.3
Eastern Europe 2559.6666667 2559.7
Middle East 8620 .
Pacific 895.88235294 895.9
South America 1538.5769231 1538.6
United States 4010.4166667 4010.4
Western Europe 1968.5882353 1968.6
Output from the datastep
for trimmed=1
:
Count
: No. of rows in the by-group
N
: Ignore this column - same as Count
trimmedRowCount
: no. of extreme rows excluded. If trimmedRowCount
= Count
then trimmedMeanreturns
is the SimpleAverage
Region count trimmedMeanreturns N trimmedRowCount
Africa 56 1183.963 56 2
Asia 14 662 14 2
Canada 37 3089.143 37 2
Central America/Caribbean 32 3561.133 32 2
Eastern Europe 31 2665.138 31 2
Middle East 24 6794.318 24 2
Pacific 45 1538.581 45 2
South America 54 1824.115 54 2
United States 40 4462.421 40 2
Western Europe 62 2538.05 62 2