1
votes

I have created proc sql query in SAS program, but need to use LAG function and it tell me it can't be used in proc sql, just in data step. Code:

proc sql;
CREATE TABLE agg_table AS
SELECT USER, MAX(TIME) AS LAST_TIME, SUM(BONUS) AS BONUS_SUM, LAG(EXPDT) AS EXPDT_LAG FROM WORK.MY_DATA GROUP BY USER_ID;

So, I don't how to combine proc sql and datastep into one query to get one table as an output? Or maybe there is a better approach to the whole problem? Thanks

1
In base SAS, I don't think you can combine sum and lag in a single SQL query / data step unless you want to write down a rather complicated code retaining an array of lagged observations. If there is no row-number column in MY_DATA that could be used to join the LAG(EXPDT), you can for example first calculate LAG(EXPDT) in data step and then calculate the aggregate columns in proc sql. - Petr
If you want useful help for solving this problem, include sample input data and sample desired output data. - Joe
Look into PROC EXPAND as well to do stats for time series data. - Reeza

1 Answers

3
votes

PROC SQL does not have a concept of rows the same way the datastep does. SQL may process rows in any order, not necessarily sequential, and may use hash tables, parallel processing, or various binary tree and similar methods to process its query; and the same query may be processed in different methods. Thus lag is not usable in SQL, nor are diff or other functions that expect row data.

It's unclear from your question what exactly you're doing, so it's not really possible to give a direct answer how to do this separately; but you may be able to accomplish this entirely in one datastep, or you may combine a datastep and a SQL query, or two datasteps. You can perform the lag in a prior datastep or a view, then the rest in SQL; or you may use a DoW loop datastep to perform the max/sum elements.