This is a variation of the knapsack problem. You are trying to find a subset of objects (here, 3 people), such that their properties come close to some specified target values (here, the sum [or mean] and corrected sum of squares [or std dev]). This is also known as a moment-matching problem.
As stated, the problem is not well defined. You need to specify an objective function to minimize. For example, you might choose the function
(mean-target_mean)**2 + (stdDev - target_stdDev)**2
where (mean, stdDev) are the moments for each sampe of size 3.
For small sets (as in your example), you can do a complete enumeration of the "N choose 3" combinations to figure out which combination to choose. See the article "Generate combinations in SAS" for tips. For example, in SAS/IML, you can solve the stated problem as follows:
data A;
length NAME $5.;
input NAME $ x1 x2;
datalines;
Bill 3 2
James 4 5
Rick 5 6
Bob 3 7
Clas 5 2
Brye 2 9
Mann 8 5
Pids 4 8
Tony 0 7
Lou 2 6
;
proc iml;
use A; read all var {Name x1}; close;
N = nrow(x1); /* number of obs */
k = 3; /* size of subset */
targetMean = 4;
targetStd = 0.95;
idx = allcomb(N, k); /* all M='N choose 3' combinations */
X = shape( x1[idx], nrow(idx) );
mean = mean(X`); /* 1 x M vector of sample means */
std = std(X`); /* 1 x M vector of sample std devs */
objective = (mean - targetMean)##2 + (std - targetStd)##2;
minVal = objective[><]; /* minimize objective */
minIdx = objective[>:<]; /* a sample that achieves minimum */
sampNames = Name[idx[minIdx,]];
sampVals = x1[idx[minIdx,]];
print sampVals[rowname=sampNames];
Of course, there might be more than one solution. This example has 8 solutions.
For problems in which there are N items and you want a subset of size k, and for which 'N choose k' is prohibitively large, you can generate random subsets by using the RANCOMB function (or PROC SURVEYSELECT, as someone mentioned). Alternately, you can formulate this problem as an optimization problem. You can use algorithms in SAS/OR or SAS/IML to solve it. For moderate-sized subsets, you could use genetic algorithms in SAS/IML, which are useful for knapsack-like problems.
proc surveyselect
and repeat if you get too far off. You can usecontrol
sorting to make it a bit more likely that it's closer, but I'm not sure that would be appropriate here. But perhaps someone else will know of one. – Joe