0
votes

I am reading some code that involves a sortedby option which confuses me a little bit.

So in the first part of the code, the following code was used to readin the data:

 data _quotes / view=_quotes;
    set taq.&yyyymmdd:;
    by symbol date time NOTSORTED ex; length EXN 3.;
 run;

Note that there is a NOTSORTED option here. when I remove it, SAS would return an error ERROR: BY variables are not properly sorted on data set.

Based on my understanding of how SAS NOTSORTED works, the taq dataset is not properly sorted, but in proper groups.

However, in the following code almost immediately after the previous code (no sort code is involved in between), there in no more NOTSORTED option, but there is no error:

data &outset (sortedby= SYMBOL DATE TIME index=(SYMBOL)
              label="WRDS-TAQ NBBO Data");
set _quotes;
by symbol date time;
run;

Therefore, I was wondering if it is because of the sortedby statement that made the difference? I read the SAS documentation, it seems that sortedby will NOT sort the dataset but only specify how the data is currently sorted.

But why did the by statement without the NOTSORTED option work in the second code but not the first code?

1
I'll add a C to my RTM - Read the Correct Manual. Why are you reading SAS V8 docs since I assume you're not using 20 year old software. support.sas.com/documentation/cdl/en/ledsoptsref/69751/HTML/…Reeza
More specifically, and linked in the docs above: support.sas.com/documentation/cdl/en/lrcon/69852/HTML/default/…Reeza
@Reeza - not sure I follow how that answers the question OP has?Joe
The sort indicator is set when a data set is sorted by a SORT procedure, an SQL procedure with an ORDER BY clause, a DATASETS procedure MODIFY statement, or a SORTEDBY= data set option....If the SORTEDBY= data set option was used to sort the data set, which is being sorted by the user, the CONTENTS procedure output indicates the Validated sort information is set to NO and the Sortedby sort information is updated with the variable or variables specified by the data set option.Reeza
Run a proc contents to check how they differReeza

1 Answers

2
votes

Note that the variable ex is not on the BY statement on the second step. If the source data is sorted by symbol date time but is not sorted by ex, what you are observing makes sense.

It's unusual to put the notsorted option in the middle of a list of variables. No matter where it is placed, it applies to the entire list. In this case, it's possible the author was intending to suggest to the reader which variable was not sorted. I find this style confusing.

To check, add ex to the BY statement on the second step and see if it throws an error.