9
votes

can anyone help with a BIgQuery SQL query to extract the time on page for a specific page from Google Analytics data please?

For every visitorId who has visited a particular page I would like the time on page for that page. This is so that I can calculate the median time on page rather than the mean.

I'm assuming that the visitorId, hits.hitsNumber and hits.time dimensions will be needed. Also that somehow the hits.time for the hit where the page was viewed will need to be subtracted from the hits.time of the following hit.

Any help much appreciated.

2

2 Answers

8
votes

Try this:

SELECT 
  fullVisitorId,
  hits.page.hostname,
  hits.page.pagePath,
  hits.hitNumber,
  hits.time,
  nextTime,
  (nextTime - hits.time) as timeOnPage
FROM(
  SELECT
    fullVisitorId, 
    hits.page.hostname,
    hits.page.pagePath,
    hits.hitNumber,
    hits.time,
    LEAD(hits.time, 1) OVER (PARTITION BY fullVisitorId, visitNumber ORDER BY hits.time ASC) as nextTime
  FROM [PROJECTID:DATASETID.ga_sessions_YYYYMMDD]
  WHERE hits.type = "PAGE"
)

The key to this code is the LEAD() function, which grabs the specified value from the next row in the partition, based on the PARTITION BY and ORDER BY qualifiers.

Hope that helps!

0
votes

To Account for last page time, this query can be used, and it will give zero time on last page,since BQ doesn't have way to calculate time spent on last page, but it will at least give zero instead of null.

SELECT 
  fullVisitorId,
  hits.page.hostname,
  hits.page.pagePath,
  hits.hitNumber,
  hits.time,
  nextTime,
  CASE
    WHEN hits.isExit IS NOT NULL THEN last_interaction - hit_time
    ELSE next_pageview - hit_time
  END
  AS time_on_page
FROM(
  SELECT
    fullVisitorId, 
    hits.page.hostname,
    hits.page.pagePath,
    hits.hitNumber,
    hits.isExit,
    hits.time/1000 as hits_time,
    LEAD(hits.time/1000, 1) OVER (PARTITION BY fullVisitorId, visitNumber ORDER BY hits.time ASC) as nextTime,
    MAX(IF(hits.isInteraction = TRUE,hits.time / 1000,0)) OVER (PARTITION BY fullVisitorId, visitStartTime) AS last_interaction
  FROM [PROJECTID:DATASETID.ga_sessions_YYYYMMDD]
  WHERE hits.type = "PAGE"
)