I am running some queries on GA data in BigQuery and I am encountering a recurring problem when I want to return a sum of data from an unnested table, where in my totals are much higher than expected - I suspect that unnested rows are being counted, resulting in an inaccurate count. here is an example:
SELECT DATE, SUM(totals.transactions)
FROM `PROJECTNAME.43786551.ga_sessions_20*` AS GBP
WHERE parse_date('%y%m%d', _table_suffix) between
DATE_sub(current_date(), interval 1 day) and
DATE_sub(current_date(), interval 1 day)
GROUP BY DATE
Returns:
1 20171122 12967
Which is as expected. Next, I want to use a field from hits.
, which requires me to unnest hits, making my query:
SELECT DATE, SUM(totals.transactions), MIN(hits.page.hostname) AS site
FROM `PROJECTNAME.43786551.ga_sessions_20*` AS GBP
CROSS JOIN UNNEST (hits) as hits
WHERE parse_date('%y%m%d', _table_suffix) between
DATE_sub(current_date(), interval 1 day) and
DATE_sub(current_date(), interval 1 day)
GROUP BY DATE
However, the results for this now show:
20171122 2320004 www.hostname.com
The count of transactions is much higher, I assume it's counting all unnested rows, how can I get around this issue, where I want to count and unnested tables, but use a field from unnested too?