I am a newbie in Google Analytics / Big Query area. I am trying to understand how to count sessions in Big Query, but I can't make sense out of the data that I am getting.
As I understand,
- Each row in BQ = one session
- Session Identifier in BQ = visitId + fullvisitorId
- There is a situation in which Google Analytics sessions are lower than Big Query sessions, because GA automatically filters out sessions with no interaction events
The thing is, when I run a query with this understandings, this happens:
- Sessions in GA = 7.763.228 sessions
- Sessions in BQ with distinct visitId + fullvisitorId = 7.750.579 (lower than GA, which shows that the #3 understanding is not the case)
- Sessions in BQ summing totals.visits = 7.763.228 (same as GA)
- Number of rows = 7.763.592 (because includes 364 sessions with visits=NaN)
What I want to know, is which data can I trust most, and what is the real unique identifier for each session (as the numbers are not equal).
Also, I noticed that visitId + fullvisitorId is not giving me a unique identifier of the session, as I have 13.013 visitId + fullvisitorId duplicates among the data. Do someone have any idea, or an explanation?
Thanks in advance.
COUNT(DISTINCT CONCAT(CAST(fullVisitorId AS STRING),CAST(visitId AS STRING)))matches sessions in GA. Note that totals.visits will match GA because it excludes non-interaction sessions. - Ben P