2
votes

I would like to simulate 1000 samples from my original data set using PROC SURVEYSELECT. I don't want SAS to output 1000 sampled data sets because it would take up a lot of space. How do I create 1000 indicator variables and attach them to my original data set? Each of these indicator variables will have a value of 1 if my observation is selected in the replicate, and 0 otherwise.

1

1 Answers

1
votes

The canonical paper on this is David Cassell's Don't be Loopy.

The basic core is this statement:

proc surveyselect data=YourData out=outboot /* 1 */
 seed=30459584 /* 2 */
 method=urs /* 3 */
 samprate=1 /* 4 */
 outhits /* 5 */
 rep=1000; /* 6 */
 run;

where outhits will generate a new record for each sampled row (if it's sampled 2 or more times, it will become 2 or more records), and a variable REPLICATE will be created that will store the replicate number.

It only creates one variable (with values 1-1000), but that's normally desirable: you can then run the analysis with BY REPLICATE; and get your results.

If you need to turn it into 1000 variables/one row per ID, you can have a data step after the SurveySelect and use an array:

data want;
  set outboot;
  by [id-variable];
  array rep[1000];
  retain rep1-rep1000;
  rep[replicate]=1;
  if last.[id-variable] then do;
    output;
    call missing(of rep[*]);
  end;
run;