1
votes

In many cases, one can choose any order for statements and options within SAS procedures.

For instance, as far as statements' order is concerned, the two following PROC FREQ, in which the order of the BY and the TABLES statements is interverted, are equivalent:

PROC SORT DATA=SASHELP.CLASS OUT=class;
    BY Sex;
RUN;

PROC FREQ DATA=class;
    BY Sex;
    TABLES Age;
RUN;
PROC FREQ DATA=class;
    TABLES Age;
    BY Sex;
RUN;

In a similar way, as far as options' order is concerned, the two following PROC PRINT, in which the order of the OBS= and the FIRSTOBS= options is interverted, are equivalent:

PROC PRINT DATA=SASHELP.CLASS (FIRSTOBS=2 OBS=5);
RUN;
PROC PRINT DATA=SASHELP.CLASS (OBS=5 FIRSTOBS=2 OBS=5);
RUN;

But there is some exceptions.

For instance, as far as options' order is concerned, among the two following PROC PRINT, in which the location of the NOOBS option is different, the second PROC PRINT, where the NOOBS option is preceding the parentheses, results in an error while the first PROC PRINT is correct:

PROC PRINT DATA=SASHELP.CLASS (FIRSTOBS=2 OBS=5) NOOBS;
RUN;
PROC PRINT DATA=SASHELP.CLASS NOOBS (FIRSTOBS=2 OBS=5);
RUN;

Similarly, as far as statements' order is concerned, I occasionally met cases where a certain statement must be placed before other(s) statement(s) - but, unfortunately, I don't remember in which procedure (probably a statistical one, for duration or multilevel models).

While the ordering question within data steps might be seen as a completely different question, because within data steps the statements' order is most of the time a matter of logic, the way of ordering some statements looks like being partly a matter of conventional ordering, as within procedures; it is for instance the case in the following merging procedure, where the MERGE statement must precede the BY statement; but I suppose that SAS could have been designed to understand these statements in any order:

/* to get a simple example of merge I start with artificially cutting the Class dataset in two parts */
PROC SORT DATA=SASHELP.CLASS OUT=class;
    BY Name;
RUN;
DATA sex_and_age;
   SET class (KEEP=Name Sex Age);
RUN;
DATA height_and_weight;
   SET class (KEEP=Name Height Weight);
RUN;
DATA all_variables;
   MERGE sex_and_age  height_and_weight;
   BY Name;
RUN;

Because I am unable to find out such a guide, my question is: does it exist a text devoted to the question of the required order for statements and options within SAS procedures?

3
Not clear to me what your question is. Your first point is just a misunderstanding of dataset option syntax. Not sure what the second point is. Data steps are essentially programs so order of statements (order of operations) will definitely impact how it works.Tom

3 Answers

1
votes

Joel,

Let me address the NOOBS example to help clarify. The 2 statements:

PROC PRINT DATA=SASHELP.CLASS (FIRSTOBS=2 OBS=5) NOOBS;

PROC PRINT DATA=SASHELP.CLASS NOOBS (FIRSTOBS=2 OBS=5);

Those are dataset options and they affect the read of the dataset. There are a number of them, including KEEP, DROP, WHERE, etc. NOOBS is not a dataset so you get an error. Dataset options are subsequent to the dataset name.

The order of statements, in many cases, is important because it sets the PDV (program data vector). Hence, why an ATTRIB should be at the top of a data step. Some procs, it doesn't matter since they will all be combined for execution.

data test;
   attrib myNewVar   length=$8 format=$20.
          myNewVar2  format=date.
          ;
   set sashelp.class;
   myNewVar = 'Hey Joel!';
   myNewVar2 = '24FEB2020'd;
run;
0
votes

A parenthetical list of name=value pairs after a data set specifier are known as data set options. Thus you need to be able to anticipate what the SAS submit parser will be doing.

* (...) applies to SASHELP.CLASS;
PROC PRINT DATA=SASHELP.CLASS (FIRSTOBS=2 OBS=5);

* (...) are where a option name or options name=value is expected -- error ensues;
PROC PRINT DATA=SASHELP.CLASS NOOBS (FIRSTOBS=2 OBS=5);

* (...) applies to SASHELP.CLASS, NOOBS is in a proper option location within the PROC statement;
PROC PRINT NOOBS DATA=SASHELP.CLASS (FIRSTOBS=2 OBS=5);

Any special statement ordering is found in the PROC documentation. Some procs have common syntax and documentation will redirect you.

0
votes

Your first point appears to be caused by not understanding what dataset options are. Otherwise order of optional parts of statement (like PROC PRINT) will be specified in the documentation for that statement.

To the second point it appears you are confusing the purpose of the BY statement in a PROC and the BY statement in a data step. In a PROC step the BY statement tells it to process the data in groups. In a DATA step the BY statement must be linked to a specific MERGE/SET/UPDATE statement.