0
votes

I am a newbie in Google Analytics / Big Query area. I am trying to understand how to count sessions in Big Query, but I can't make sense out of the data that I am getting.

As I understand,

  1. Each row in BQ = one session
  2. Session Identifier in BQ = visitId + fullvisitorId
  3. There is a situation in which Google Analytics sessions are lower than Big Query sessions, because GA automatically filters out sessions with no interaction events

The thing is, when I run a query with this understandings, this happens:

  • Sessions in GA = 7.763.228 sessions
  • Sessions in BQ with distinct visitId + fullvisitorId = 7.750.579 (lower than GA, which shows that the #3 understanding is not the case)
  • Sessions in BQ summing totals.visits = 7.763.228 (same as GA)
  • Number of rows = 7.763.592 (because includes 364 sessions with visits=NaN)

What I want to know, is which data can I trust most, and what is the real unique identifier for each session (as the numbers are not equal).

Also, I noticed that visitId + fullvisitorId is not giving me a unique identifier of the session, as I have 13.013 visitId + fullvisitorId duplicates among the data. Do someone have any idea, or an explanation?

Thanks in advance.

1
Broadly speaking BigQuery should be more accurate than GA, because it will never be sampled. For me, COUNT(DISTINCT CONCAT(CAST(fullVisitorId AS STRING),CAST(visitId AS STRING))) matches sessions in GA. Note that totals.visits will match GA because it excludes non-interaction sessions. - Ben P

1 Answers

0
votes

In Google Analytics sessions break at midnight each day so that you have no crossing sessions. In those cases, the visitId field, which is simply a timestamp indicating when the user session started, still has the same value from the first session. So you can have two sessions with the same fullVisitorId + visitId combination if it does cross the midnight limit.

If you need a unique identifier that matches the data from Google Analytics, you should use visitStartTime + fullVisitorId. visitStartTime works the same as visitId, but it is set again if the session breaks at midnight. You also need to filter rows with totals.visits different than 1, as you already pointed out.

Unfortunately, I don't know of any official resources that point to this distinction, but both community discussions and testing confirm this.