I am currently trying to create a column within a BigQuery table that simply includes the lagged values of the hits.page.pagePath. For this example, imagine that I have hits.time values of 0, 513, 518, 762, 991. These values correspond to a unique_visit_id which I created. There are thousands of unique visits per day, each with different hits.time values and row numbers.
To help with a variety of calculations, I want to add a column to the data frame that simply includes the lagged value for all rows in the hits.page.pagePath column (with there being no lag associated when hits.time > 0). Note that the hits.time is a repeated group that refers to a unique visit id which is calculated using CONCAT(fullVisitorId, STRING(visitId)) AS unique_visit_id. Thus, I suspect that a command like OVER would be involved. My initial query used the following approach:
SELECT date, LAG(hits.page.pagePath,1), CONCAT(fullVisitorId, STRING(visitId)) AS unique_visit_id, visitId, visitNumber, fullVisitorId, totals.pageviews, totals.bounces,
hits.page.pagePath, device.deviceCategory, device.browser, device.browserVersion, hits.customVariables.index,
hits.customVariables.customVarName, hits.customVariables.customVarValue, hits.time
FROM (FLATTEN([XXXXXXXX.ga_sessions_20140711], hits.time))
WHERE hits.customVariables.index = 4
ORDER BY unique_visit_id DESC, hits.time ASC
LIMIT 1000;
Thus unfortunately returned the error:
Error: LAG is an analytic function and must be accompanied by an OVER clause.
Job ID: XXXXXXXX-XXX:job_zduhuRKgvrgmA7niBzcyb3empwY
I have guessed that OVER would reference the CONCAT(fullVisitorId, STRING(visitId)) AS unique_visit_id since each row containing the same unique_visit_id represents a hit inside a unique visit. Unfortunately, using an OVER here has also returned an error.