1
votes

I am doing practice assignment as part of my BASE SAS certification prep, to see when a data step ends. Below is the code:

data first; 
input x;
datalines; 
1
2
9
;
run;

data second;
input x;
datalines;
3
4
5
6
;
run;
data third;
set first;
output;
set second;
output;
run;

Output is: 1 3 2 4 5 9

But when I have only 2 values 1 and 2 in the first dataset, output is 1 2 3 4 and not 1 3 2 4 . Why is it so?

3
I tested your method, result I got was: 1 3 2 4, not 1 2 3 4.Shenglin Chen
Oh ok. Thank you Shenglin. Then that could be something to do with SAS environment I thinkNaga Vemprala
@NagaVemprala - it's to do with the way the datastep processes - I've described the process to explain what's happening in my answer belowBendy

3 Answers

1
votes

The datastep process as implicit do loops. So when you consider your datastep...

data third;
  set first;
  output;
  set second;
  output;
run;

...your two set statements both act as a dripfeed, providing one observation from the corresponding dataset sets specified on each interation through the datastep loop.

If you wanted observations in third to be in the order of:

1, 2, 9, 3, 4, 5, 6

Then you need to change the datastep to provide just one set statement to dripfeed in both datasteps one after the other:

data third;
  set first second ;
  output;
run;
0
votes

I think the set statement reads obs from both datasets simultaneously.

so in PDV the first iteration n=1 then x = . and x = 1 (from first) n=2 then x = 3 and x =2 (from second and first) and so on... because of two explicit output statements I would say.

can be more clear if you use put statement.

     data third;
     put _all_;
     set first;
     output;
     put _all_;
     set second;
     output;
     run;

same happens when you read second dataset followed by first.

0
votes

Because that is what you told it to do?

SAS executes the data step until it reads past the end of an input file (or it detects and infinite loop). In your case it stops when it tries to read a fourth observation from the first SET statement. Hence it never gets to the second SET statement on that fourth iteration.