1
votes

I'm having issues with the below query in BigQuery for Google Analytics. For some reason I'm not able to count the number of users as unique, it essentially counts the number of rows so the numbers are very similar to sessions. I've also tried EXACT_COUNT_DISTINCT() but gives me the same answer.

    SELECT
  date AS Day,
  MAX(CASE
      WHEN hits.sourcePropertyInfo.sourcePropertyTrackingId CONTAINS '778****' THEN 'MUG'
      WHEN hits.sourcePropertyInfo.sourcePropertyTrackingId = 'Social' THEN 'Social'ELSE 'Website' END) AS Property,
  geoNetwork.country AS Country,
  SUM(totals.visits) AS visits,
  COUNT (DISTINCT(fullVisitorId), 1000000) AS Users,
  SUM(IFNULL(totals.newVisits,0)) AS NEW,
  (SUM(IFNULL(totals.screenviews,0))+SUM(IFNULL(totals.pageviews,0))) AS PAGEVIEWS,
  IFNULL(SUM(CASE
        WHEN totals.screenviews = 1 THEN SUM(IFNULL(totals.screenviews,0))
        ELSE 0 END)+ SUM(IFNULL(totals.bounces,0)),0) AS BOUNCES,
  SUM(CASE
      WHEN REGEXP_MATCH(hits.eventInfo.eventAction,'register$|registersuccess|new registration|account signup|registro') THEN 1
      ELSE 0 END) AS NewRegistrations,
  SUM(CASE
      WHEN REGEXP_MATCH(hits.eventInfo.eventAction, 'add to cart|add to bag|click to buy|ass to basket|comprar') OR hits.eventInfo.eventAction CONTAINS 'addtobasket::' THEN 1
      ELSE 0 END) AS ClickToBuy,
  SUM(IFNULL(totals.transactions,0)) AS Transactions,
  SUM(IFNULL(totals.transactionRevenue,0))/1000000 AS Revenue
FROM (TABLE_DATE_RANGE([****.ga_sessions_], TIMESTAMP('2017-03-15'), TIMESTAMP('2017-03-31'))),
GROUP BY
  Day,
  Country,
  geoNetwork.country,
  totals.screenviews;
1
Why are you grouping by screenviews? - Elliott Brossard
@ElliottBrossard This, i think, is the problem. I try to leave it out but it keeps forcing me to - Aaron Harris
I think the problem is that you have some nested aggregations, i.e. SUM within a SUM. If you fix that logic, the query should work. I would really recommend using standard SQL for your analysis, though. You might also be interested in the migration guide. - Elliott Brossard
thanks @elliot, will give it a go. Would the subqueries within selects be the biggest advantage here? - Aaron Harris
A couple of others are that COUNT(DISTINCT ...) gives exact results (and is generally faster than EXACT_COUNT_DISTINCT), and that repeated field handling related to ga_sessions_ tables is a lot saner, though you may find that there is a learning curve. The Working with Arrays topic is a good introduction. - Elliott Brossard

1 Answers

2
votes

I just tested this query and it seems a bit simpler:

SELECT
date,
MAX(CASE
     WHEN hits.sourcePropertyInfo.sourcePropertyTrackingId CONTAINS '778****' THEN 'MUG'
     WHEN hits.sourcePropertyInfo.sourcePropertyTrackingId = 'Social' THEN 'Social'ELSE 'Website' END) AS Property,
geoNetwork.country AS Country,
SUM(totals.visits) AS visits,
COUNT(DISTINCT(fullVisitorId), 1000000) AS Users,
SUM(totals.newVisits) AS NEW,
SUM(totals.pageviews) AS PAGEVIEWS,
SUM(totals.bounces) AS BOUNCES,
SUM(CASE
      WHEN REGEXP_MATCH(hits.eventInfo.eventAction,'register$|registersuccess|new registration|account signup|registro') THEN 1
      ELSE 0 END) AS NewRegistrations,
SUM(CASE
      WHEN REGEXP_MATCH(hits.eventInfo.eventAction, 'add to cart|add to bag|click to buy|ass to basket|comprar|addtobasket::') THEN 1
      ELSE 0 END) AS ClickToBuy,
SUM(totals.transactions) AS Transactions,
SUM(totals.transactionRevenue) /1000000 AS Revenue
FROM (TABLE_DATE_RANGE([project_id:dataset_id.ga_sessions_], TIMESTAMP('2017-03-15'), TIMESTAMP('2017-03-31'))),
GROUP BY
date, Country

It did work in our database (not sure why you summed screenviews with pageviews though).

In Standard SQL (highly recommended that you use this version) maybe this already solves for you:

SELECT
date,
MAX(CASE
     WHEN exists(select 1 from unnest(hits) hits where regexp_contains(hits.sourcePropertyInfo.sourcePropertyTrackingId, r'778\*\*\*\*')) THEN 'MUG'
     WHEN exists(select 1 from unnest(hits) hits where hits.sourcePropertyInfo.sourcePropertyTrackingId = 'Social') THEN 'Social'ELSE 'Website' END) AS Property,
geoNetwork.country AS Country,
SUM(totals.visits) AS visits,
COUNT(DISTINCT(fullVisitorId)) AS Users,
SUM(totals.newVisits) AS new_,
SUM(totals.pageviews) AS PAGEVIEWS,
SUM(totals.bounces) AS BOUNCES,
SUM(CASE
      WHEN exists(select 1 from unnest(hits) hits where REGEXP_contains(hits.eventInfo.eventAction,'register$|registersuccess|new registration|account signup|registro')) THEN 1
      ELSE 0 END) AS NewRegistrations,
SUM(CASE
      WHEN exists(select 1 from unnest(hits) hits where REGEXP_contains(hits.eventInfo.eventAction, 'add to cart|add to bag|click to buy|ass to basket|comprar|addtobasket::')) THEN 1
      ELSE 0 END) AS ClickToBuy,
SUM(totals.transactions) AS Transactions,
SUM(totals.transactionRevenue) /1000000 AS Revenue
FROM `project_id.dataset_id.ga_sessions*`
where 1 = 1
and parse_timestamp("%Y%m%d", regexp_extract(_table_suffix, r'.*_(.*)')) between  TIMESTAMP('2017-03-15') and  TIMESTAMP('2017-03-31')
GROUP BY
date, Country