3
votes

Suppose I have a SAS dataset that looks like this:

id  x
1   1234
2   2345
3   3456

I need a new data set that has this data set read in (say) 2 times, with a new variable indicating which "replication" this is:

id  x     rep
1   1234  1
2   2345  1
3   3456  1
1   1234  2
2   2345  2
3   3456  2

It is important that the data are read in this exact order -- the entire initial data set is read once, then again, etc.

Any ideas on an efficient way to do this in a data step? (In reality my data set is huge, I need to read it several times, and I want to avoid sorting.)

I tried this, but the order of the observations in the new data set is not what I want:

data foo;
 set tmp; rep=1; output;
 set tmp; rep=2; output;
run;
3

3 Answers

8
votes

If you want to keep to data step, then this will work as you described.

data foo;
  set tmp (in=INA) tmp (in=INB);
  if INA then REP=1;
  if INB then REP=2;
run;
1
votes
data rep;
  set tmp;

  do rep = 1 to 2; /* or 3, or whatever */
    output;
  end;
proc sort;
  by rep id;
run;

That's it.

0
votes

You could try it using a view and proc append like this:

/* create view for rep=2 */

data rep2 / view=rep2;
 set tmp;
 rep = 2;
run;

/* create dataset for rep=1 */

data foo;
 set tmp;
 rep = 1;
run;

/* append rep=2 to rep=1 dataset */

proc append base=foo data=rep2;
run;