I would like to simulate 1000 samples from my original data set using PROC SURVEYSELECT. I don't want SAS to output 1000 sampled data sets because it would take up a lot of space. How do I create 1000 indicator variables and attach them to my original data set? Each of these indicator variables will have a value of 1 if my observation is selected in the replicate, and 0 otherwise.
2
votes
1 Answers
1
votes
The canonical paper on this is David Cassell's Don't be Loopy.
The basic core is this statement:
proc surveyselect data=YourData out=outboot /* 1 */
seed=30459584 /* 2 */
method=urs /* 3 */
samprate=1 /* 4 */
outhits /* 5 */
rep=1000; /* 6 */
run;
where outhits
will generate a new record for each sampled row (if it's sampled 2 or more times, it will become 2 or more records), and a variable REPLICATE
will be created that will store the replicate number.
It only creates one variable (with values 1-1000), but that's normally desirable: you can then run the analysis with BY REPLICATE;
and get your results.
If you need to turn it into 1000 variables/one row per ID, you can have a data step after the SurveySelect
and use an array:
data want;
set outboot;
by [id-variable];
array rep[1000];
retain rep1-rep1000;
rep[replicate]=1;
if last.[id-variable] then do;
output;
call missing(of rep[*]);
end;
run;