I've got a Standard SQL query written to pull some Google Analytics data out of BigQuery. I've pasted a simplified version below that I use for testing:
SELECT COUNT(DISTINCT(session_id))
FROM (
SELECT
CONCAT(CAST(fullVisitorId AS STRING), '.', CAST(visitStartTime AS STRING)) AS session_id
FROM `myproject.ga_sessions_20180227`, UNNEST(hits) AS hits
);
Running this query outputs a unique session count of 6,696, which is accurate when compared to the GA UI.
However, when I introduce customDimensions into the query, I lose a number of rows. Running this query only outputs 6,606 unique sessions.
SELECT COUNT(DISTINCT(session_id))
FROM (
SELECT
CONCAT(CAST(fullVisitorId AS STRING), '.', CAST(visitStartTime AS STRING)) AS session_id,
MAX(IF(customs.index = 14, customs.value, NULL)) AS custom_id
FROM `myproject.ga_sessions_20180227`, UNNEST(customDimensions) AS customs
GROUP BY fullVisitorId, visitStartTime
);
I'm not using hits.customDimensions, because the custom dimension in question is based on the user level, not the hit level. If I don't group the output, I get an error.
How can I unnest a non-hit based custom dimension without losing rows?