SAS outputting results to the input dataset (same in and out dataset name)

Question

I could not find information about this problem, or could not specify the question correctly.

Let me ask the question with code:
Is this operation

data work.tmp;
    set work.tmp;
    * some changes to data here;
run;

or especially

proc sort data = work.tmp out = work.tmp;
    by x;
run;

dangerous in any way, or considered a bad practice in SAS? Note the same input and output dataset names, which is my main point. Does SAS handle this situation correctly so there would be no ambiguous results with running this kind of data step/procedure?

Joe Joe · Accepted Answer · 2015-08-17T14:32:05

The latter, sorting into itself, is done fairly frequently; as sort is just re-arranging the dataset, and (unless you are depending on the order being otherwise, or unless you use a where clause to filter the dataset or rename/keep/drop options) doesn't do any permanent harm to the dataset, it's not considered bad practice, as long as tmp is in work (or a libname intended to be used as a working directory). SAS creates a temporary file to do the sort, and when it's successful it deletes the old one and renames the temporary file; no substantial risk of corruption.

The former, setting a dataset to itself in a data step, is usually not considered a good practice. That's because a data step often does something irreversible - ie, if you run it once it has a different result than if you run it again. Thus, you risk not knowing what status your datastet has; and while with sort you can rely on knowing because you get an obvious error if it's not properly sorted most of the time, with the data step you might never know. As such, each data step should generally produce a new dataset (at least, new to that thread). There are times when it's necessary to do this, or at least would be substantially wasteful not to - perhaps a macro that sometimes does a long data step and sometimes doesn't - but usually you can program around it.

It's not dangerous in the sense that the file system will get confused, though; similar to sort, SAS will simply create a temporary file, fill the new dataset, then delete the old one and rename the temporary file.

(I leave aside mention of things like modify which must set a dataset to itself, as that has an obvious answer...)

SAS outputting results to the input dataset (same in and out dataset name)

2 Answers