0
votes

I have managed to calculate total events by ISOweek but not unique events for a given Google Analytics Event using BigQuery. When checking GA, total_events matches the GA interface on the dot but unique_events are off. Do you know how I can solve this?

The query:

SELECT INTEGER(STRFTIME_UTC_USEC(PARSE_UTC_USEC(date),"%V")) iso8601_week_number,
hits.eventInfo.eventCategory,
hits.eventInfo.eventAction,
COUNT(hits.eventInfo.eventCategory) AS total_events,
EXACT_COUNT_DISTINCT(fullVisitorId) AS unique_events
FROM
    TABLE_DATE_RANGE([XXXXXX.ga_sessions_], TIMESTAMP('2017-05-01'), TIMESTAMP('2017-05-07'))
WHERE
  hits.type = 'EVENT' AND hits.eventInfo.eventCategory = 'BIG_Transaction'
GROUP BY
iso8601_week_number, hits.eventInfo.eventCategory, hits.eventInfo.eventAction
4
what happens if you add AND totals.visits=1 in your where clause? Does it work? - Willian Fuks
Hi @Will the unique_events are still off when adding the AND totals.visits=1. Do you know other things I could try? - Maarten Berge
hmm not sure why it's not working then. Other things would be to make sure that your TABLE_DATE_RANGE is not including intradays and ga_sessions table at the same time (what would cause data to duplicate), also maybe make sure if the analyzes in GA is done correctly. Other than that I can't see any error in this query (if there's one, i can't see it). Also, I recommend you to use the standard version of BQ. cloud.google.com/bigquery/docs/reference/standard-sql It's much better. - Willian Fuks
@will "TABLE_DATE_RANGE is not including intradays and ga_sessions table at the same time (what would cause data to duplicate)" With the date range set in past would BigQuery still evaluate intraday or was that just in case I was including the todays date in the range? Thanks for the standard SQL tip. - Maarten Berge
Sometimes the intraday tables are not deleted so depending on how you select past tables you can end up querying both. - Willian Fuks

4 Answers

3
votes

Depending on the scope you need to count(distinct ) different things, but you always need to fulfill these conditions:

  • unique events refer to the combination of category, action and label
  • make sure eventAction is not NULL
  • make sure eventLabel is not NULL
  • eventCategory is allowed be NULL

I'm using COALESCE() to avoid NULLs

Example Session Scope

SELECT
  SUM( (SELECT COUNT(h.eventInfo.eventCategory) FROM t.hits h) ) events,
  SUM( (SELECT COUNT(DISTINCT 
    CONCAT( h.eventInfo.eventCategory,
      COALESCE(h.eventinfo.eventaction,''),
      COALESCE(h.eventinfo.eventlabel, ''))
      )
    FROM
      t.hits h ) ) uniqueEvents
FROM
  `google.com:analytics-bigquery.LondonCycleHelmet.ga_sessions_20130910` t

Example Hit Scope

SELECT
  h.eventInfo.eventCategory,
  COUNT(1) events,
  -- we need to take sessions into account, so we add fullvisitorid and visitstarttime
  COUNT(DISTINCT CONCAT(fullvisitorid, CAST(visitstarttime AS string), 
    COALESCE(h.eventinfo.eventaction,''), 
    COALESCE(h.eventinfo.eventlabel, ''))) uniqueEvents
FROM
  `google.com:analytics-bigquery.LondonCycleHelmet.ga_sessions_20130910` t,
  t.hits h
WHERE
  h.type='EVENT'
GROUP BY
  1
ORDER BY
  2 DESC

hth!

1
votes

The definition of unique events in Google Analytics is:

A count of the number of times an event with the category/action/label value was seen at least once within a session.

In other words, the number of sessions in which a specific event (defined by category, action AND label) was sent. In your query, you count the number of unique visitors that had the event, while you need to count the number of sessions and keep in mind that events with different labels should be counted as different unique events (although we are only interested in category and action).

A possible way to fix your code is:

SELECT 
  INTEGER(STRFTIME_UTC_USEC(PARSE_UTC_USEC(date),"%V")) iso8601_week_number,
  hits.eventInfo.eventCategory,
  hits.eventInfo.eventAction,
  COUNT(hits.eventInfo.eventCategory) AS total_events,
  EXACT_COUNT_DISTINCT(CONCAT(fullVisitorId,'-',string(visitId),'-',date,'-',ifnull(hits.eventInfo.eventLabel,'null'))) AS unique_events    
FROM
  TABLE_DATE_RANGE([XXXXXX.ga_sessions_], TIMESTAMP('2017-05-01'), TIMESTAMP('2017-05-07'))
WHERE
  hits.type = 'EVENT' AND hits.eventInfo.eventCategory = 'BIG_Transaction'
GROUP BY
    iso8601_week_number, hits.eventInfo.eventCategory, hits.eventInfo.eventAction

The results of this query should match with the data in the GA interface.

0
votes

I believe the issue is that you are only counting the number of unique visitors have completed the specified action, while GA defines unique events as "The number of times during a date range that a session contained the specific dimension".

Therefore, I would just change your code to the below:

SELECT INTEGER(STRFTIME_UTC_USEC(PARSE_UTC_USEC(date),"%V")) iso8601_week_number,
hits.eventInfo.eventCategory,
hits.eventInfo.eventAction,
COUNT(hits.eventInfo.eventCategory) AS total_events,
EXACT_COUNT_DISTINCT(CONCAT(fullVisitorId, STRING(visitId))) AS unique_events
FROM
    TABLE_DATE_RANGE([XXXXXX.ga_sessions_], TIMESTAMP('2017-05-01'), TIMESTAMP('2017-05-07'))
WHERE
  hits.type = 'EVENT' AND hits.eventInfo.eventCategory = 'BIG_Transaction'
GROUP BY
iso8601_week_number, hits.eventInfo.eventCategory, hits.eventInfo.eventAction

This should give you the distinct count of sessions that had the given events.

0
votes

We did something similar to what @Martin was suggesting with some cool CTEs and we were able to get an 100% match on what was coming out of Google Analytics from BigQuery.

Checkout the code snippet below that returns a per day sum of sessions + unique Add to Cart events:

   #standardSQL
WITH AN_ATC AS 
(
  SELECT
    -- full date w/ hyphens (ie 2021-01-07)
      CAST(format_date('%Y-%m-%d', parse_date("%Y%m%d", date)) AS DATE) as DATE,
      -- COUNT OF SESSIONS
      COUNT(DISTINCT CONCAT(fullVisitorId, CAST(visitStartTime AS STRING))) AS Sessions,
      -- COUNT OF UNIQUE EVENTS PER SESSION
       COUNT(DISTINCT CONCAT(fullvisitorid, CAST(visitstarttime AS string), 
        COALESCE(hits.eventinfo.eventaction,''), 
        COALESCE(hits.eventinfo.eventlabel, ''))) AS EVENTS
  
  FROM `an-big-query.PROJECT_ID.ga_sessions_*` , 
  UNNEST(hits) as hits
  WHERE
  -- start date
  _table_suffix BETWEEN '20190101'
  -- yesterday
  AND FORMAT_DATE('%Y%m%d',DATE_SUB(CURRENT_DATE(),INTERVAL 1 DAY)) 
  AND hits.eventInfo.eventAction = 'add to cart'
  GROUP BY 
    date
)
SELECT
DATE,
SESSIONS,
EVENTS
FROM AN_ATC
ORDER BY date DESC

Where,

SESSIONS = Google Analytics ga:Sessions

and

EVENTS = Google Analytics ga:uniqueEvents

BOTH with eventAction=@add to cart

Hope that helps everyone that was searching/googling!