I'm having an issue calculating unique users in our GA BigQuery export. I've reproduced the same error using the sample data.
SELECT sum(users) as users, sum(sessions) as sessions FROM (
SELECT
h.page.pagePath as page_path,
trafficSource.source,
trafficSource.medium,
COUNT(DISTINCT(fullVisitorId)) AS users,
COUNT(*) as sessions
FROM
`bigquery-public-data.google_analytics_sample.ga_sessions_20170101`, UNNEST(hits) h
WHERE h.page.pagePath = "/home"
GROUP BY page_path, source, medium
)
UNION ALL
SELECT sum(users) as users, sum(sessions) as sessions FROM (
SELECT
h.page.pagePath as page_path,
COUNT(DISTINCT(fullVisitorId)) AS users,
COUNT(*) as sessions
FROM
`bigquery-public-data.google_analytics_sample.ga_sessions_20170101`, UNNEST(hits) h
WHERE h.page.pagePath = "/home"
GROUP BY page_path
)
When I include the source and medium columns, the distinct fullVisitorId count is 10 higher than without them. How does including these columns cause an increased number of fullVisitorIds? This doesn't make sense to me.
What's causing this and how would I get an accurate count?

