0
votes

There are other threads out there about randomly selecting rows using other languages, and furthermore my question involves how to define variables from the columns of each randomly selected row.

First I import my data:

proc import OUT = WORK.ROWS
        DATAFILE = "C:\rows.xlsx"
        DBMS = EXCEL REPLACE; GETNAMES = YES; 
run;
proc print;
run;

Which consists of 10 rows, each row containing a pair of two variables (var1, var2) in separate columns. It looks like this:

obs var1    var2
1   0.8828  0.2245
2   0.8833  0.3109
3   0.8699  0.1579
4   0.9035  0.2993
5   0.9641  0.3590
6   0.8846  0.2542
7   0.8752  0.1343
8   0.9309  0.1188
9   0.9018  0.1761
10  0.8832  0.1439

Then, within a DO loop,

DO n = 1 TO 1000;       *number of simulations to run;

I would like to randomly draw a single row from the input data file and copy the value of the two variables (var1 and var2) from the row I randomly drew for further use in the same DO loop. In other words, for each iteration of the DO loop, I need to define a new random pair of variables that originated from the same row.

For example, the random values for simulations n = 1 through n = 4 for var1 and var2 could have come from obs (rows) 2, 2, 10, 4:

Simulation #   
n = 1
    var1 = 0.8846   
    var2 = 0.2542
n = 2
    var1 = 0.8846   
    var2 = 0.2542
n = 3
    var1 = 0.9309   
    var2 = 0.1188
n = 4
    var1 = 0.8832   
    var2 = 0.1439

Thank you in advance for your help.

2
I don't think you explain yourself sufficiently here. But, I would look at Don't be LOOPy for a start - see if that answers your question. That assumes you're doing something like bootstrap analysis. If that doesn't answer your question - then I suggest you make a "Want" dataset that has the results you'd want from a run with DO n=1 to 30 or something reasonable.Joe
Thanks @Joe, I tried to clarify my question a bit. Thanks for pointing me to Don't be LOOPy. I'm not sure it answers my question but I will look over it again.Gavin M. Jones
So, you want 1000 rows like the above? In that case you are doing sampling with replacement, and Don't Be LOOPy can show you that. (Really, this is either a n=1000 samplesize=1 bootstrap, or a n=1 samplesize=1000 one - not sure which.)Joe

2 Answers

2
votes

This is effectively a very tiny bootstrap, with 1000 replicates of sample size one. If that's exactly what you want, then, bingo, you have it here, in PROC SURVEYSELECT:

data have;
input obs var1    var2;
datalines;
1   0.8828  0.2245
2   0.8833  0.3109
3   0.8699  0.1579
4   0.9035  0.2993
5   0.9641  0.3590
6   0.8846  0.2542
7   0.8752  0.1343
8   0.9309  0.1188
9   0.9018  0.1761
10  0.8832  0.1439
;;;;
run;

proc surveyselect data=have out=want seed=7 outhits
                  method=urs sampsize=1 rep=1000;
run;

Now, if you really want a normal bootstrap analysis (where sample size is bigger than one - usually it's the same as your initial sample's size), you can just jigger with the sampsize= and rep= quantities until you get what you expect.

For further reading, Don't be LOOPy by David Cassell is the classic paper on the subject.

0
votes

You can do this using the point= option on the set statement. First figure out how many observations are in your dataset that you want to pull the data from. Save the number of observations into a macro var called nobs:

data _null_;
  set sashelp.class nobs=i;
  if _n_ eq 2 then stop;
  call symputx ('nobs',i);
run;

Check we're getting the expected value:

%put &=nobs;

Specify how many times you want to loop:

%let loops = 1000;

Now in a dataset, initiate a loop with the set statement inside. We're going to calculate a random number between 1 and the number of rows in the table. We're then going to issue a set statement pointing directly to that row. Once we have that, do your thing and output the record. Once we have iterated the desired number of times, force the datastep to terminate:

data want;
  do cnt=1 to &loops;
    random_obs = floor(rand("Uniform")*100/(100/&nobs))+1; * BETTER CHECK THIS MATH IF YOU NEED TO BE REALLY ACCURATE;
    set sashelp.class(keep=age sex) point=random_obs;
    * DO YOUR THING;
    output;
  end;
  stop;
run;

EDIT : I forgot to mention that sometimes it's useful to select the same 'random' observations every time you run it. If you want to do this to assist with testing you will need to add this line to the top of your dataset:

call streaminit(123); /* set random number seed */