1
votes

I am using a macro to loop through files based on names and extract data which works fine for the majority of the cases, however from time to time I experience

ERROR: BY variables are not properly sorted on data set CQ.CQM_20141113.

where CQM_20141113 is the file I am extracting data from. In fact my macro loops through CQ.CQM_2014: and it works up until 20141113. Because of this single failure the file is then not created.

I am using a data step view to "initialize" the data and then in a further step to call data step view (code sample with shortened where conditions):

%let taq_ds = CQ.CQM_2014:;

data _v_&tables / view=_v_&tables;
     set &taq_ds;
     by sym_root date time_m; *<= added by statement
     format sym_root date time_m;
     where sym_root = &stock;   
run; 

data xtemp2_&stockfiname (keep = sym_root year date iprice);
     retain sym_root year date iprice; 
     set _v_&tables; 
     by sym_root date time_m;

/* some conditions */
run;

When I see the error via the log file and I run the file again, then it works (sometimes I need a few trials).

I was thinking of a proc sort, but how to do that when using data step view? Please note the cqm-files are very large (which could also be the root of the problem).

edit: taq_dsis not a single file but runs through several files whose name start with CQM_2014, i.e. CQM_20140101, CQM_20140102, etc.

2
You could index &taq_ds. on sym_root date time_m, or create a sql view with an order by clause.Richard

2 Answers

4
votes

Based on the code provided, you could replace your first data step view with a SQL one:

proc sql;
create view _v_&tables as
  select * from &taq_ds
  where sym_root = &stock
  order by sym_root, date, time_m;

Alternatively you could prefix your data step view with a similar view. This would enforce the ordering needed for the subsequent by statement.

1
votes

Creating an index on taq_ds corresponding to the by group order would also solve this, e.g.:

proc datasets lib=<library containing taq_ds>;
modify taq_ds;
index create index1=(sym_root date time_m);
run;
quit;