0
votes

I am running some queries on GA data in BigQuery and I am encountering a recurring problem when I want to return a sum of data from an unnested table, where in my totals are much higher than expected - I suspect that unnested rows are being counted, resulting in an inaccurate count. here is an example:

SELECT DATE, SUM(totals.transactions)
FROM `PROJECTNAME.43786551.ga_sessions_20*` AS GBP
    WHERE parse_date('%y%m%d', _table_suffix) between 
    DATE_sub(current_date(), interval 1 day) and
    DATE_sub(current_date(), interval 1 day)
    GROUP BY DATE

Returns:

1 20171122 12967

Which is as expected. Next, I want to use a field from hits., which requires me to unnest hits, making my query:

SELECT DATE, SUM(totals.transactions), MIN(hits.page.hostname) AS site
FROM `PROJECTNAME.43786551.ga_sessions_20*` AS GBP
CROSS JOIN UNNEST (hits) as hits
    WHERE parse_date('%y%m%d', _table_suffix) between 
    DATE_sub(current_date(), interval 1 day) and
    DATE_sub(current_date(), interval 1 day)
    GROUP BY DATE

However, the results for this now show:

20171122 2320004 www.hostname.com

The count of transactions is much higher, I assume it's counting all unnested rows, how can I get around this issue, where I want to count and unnested tables, but use a field from unnested too?

1

1 Answers

0
votes

you should do probably something like this:

    SELECT DATE, SUM(totals.transactions), 
        (SELECT MIN(hit.page.hostname) FROM UNNEST (GBP.hits) AS hit) AS site
        FROM `PROJECTNAME.43786551.ga_sessions_20*` AS GBP
        WHERE parse_date('%y%m%d', _table_suffix) between 
        DATE_sub(current_date(), interval 1 day) and
        DATE_sub(current_date(), interval 1 day)
        GROUP BY DATE, site