4
votes

I am currently trying to create a column within a BigQuery table that simply includes the lagged values of the hits.page.pagePath. For this example, imagine that I have hits.time values of 0, 513, 518, 762, 991. These values correspond to a unique_visit_id which I created. There are thousands of unique visits per day, each with different hits.time values and row numbers.

To help with a variety of calculations, I want to add a column to the data frame that simply includes the lagged value for all rows in the hits.page.pagePath column (with there being no lag associated when hits.time > 0). Note that the hits.time is a repeated group that refers to a unique visit id which is calculated using CONCAT(fullVisitorId, STRING(visitId)) AS unique_visit_id. Thus, I suspect that a command like OVER would be involved. My initial query used the following approach:

SELECT date, LAG(hits.page.pagePath,1), CONCAT(fullVisitorId, STRING(visitId)) AS unique_visit_id, visitId, visitNumber, fullVisitorId, totals.pageviews, totals.bounces, 
hits.page.pagePath, device.deviceCategory, device.browser, device.browserVersion, hits.customVariables.index,
hits.customVariables.customVarName, hits.customVariables.customVarValue, hits.time
FROM (FLATTEN([XXXXXXXX.ga_sessions_20140711], hits.time))
WHERE hits.customVariables.index = 4
ORDER BY unique_visit_id DESC, hits.time ASC
LIMIT 1000;

Thus unfortunately returned the error:

Error: LAG is an analytic function and must be accompanied by an OVER clause.

Job ID: XXXXXXXX-XXX:job_zduhuRKgvrgmA7niBzcyb3empwY

I have guessed that OVER would reference the CONCAT(fullVisitorId, STRING(visitId)) AS unique_visit_id since each row containing the same unique_visit_id represents a hit inside a unique visit. Unfortunately, using an OVER here has also returned an error.

1

1 Answers

6
votes

Analytic functions have an odd syntax... you're almost there, but you just need to move the ORDER BY clause into the OVER statement used by LAG():

SELECT date, LAG(hits.page.pagePath,1) 
  OVER(PARTITION BY unique_visit_id ORDER BY hits.time ASC) as lagged, 
  CONCAT(fullVisitorId, STRING(visitId)) AS unique_visit_id, visitId, 
  ...
FROM (FLATTEN([XXXXXXXX.ga_sessions_20140711], hits.time))
WHERE hits.customVariables.index = 4
LIMIT 1000;