1
votes

I have two tables setup in Bigquery for tracking a users and daily page_views:

users
| id | email |

page_views
| user_id | visit_date | visit_path | counter |

Subqueries work great (when not aggregating):

SELECT id
FROM myDataset.users
WHERE account_id = 'ACCOUNT_ID'

-- this works great
AND EXISTS(
  SELECT 1
  FROM myDataset.pageviews
  WHERE user_id = users.id
  AND visit_date >= DATE_SUB(CURRENT_DATE(), INTERVAL 1 WEEK)
  AND visit_path = '/contact'
  AND counter >= 2
  LIMIT 1
)

However, when creating queries that will SUM the count of page views over a period of time I'm seeing the error:

Correlated subqueries that reference other tables are not supported unless they can be de-correlated, such as by transforming them into an efficient JOIN.

The goal is to write a query like this:

Show me all the users who have visited the pricing page 10 times in the last week AND the contact page 1 time in the last day

SELECT id
FROM myDataset.users
WHERE account_id = 'ACCOUNT_ID'

-- Correlated subqueries error
AND EXISTS(
  SELECT SUM(counter) visit_count_total
  FROM myDataset.pageviews
  WHERE user_id = users.id
  AND visit_date >= DATE_SUB(CURRENT_DATE(), INTERVAL 1 WEEK)
  AND visit_path = '/pricing'
  HAVING visit_count_total >= 10
  LIMIT 1
)

-- Correlated subqueries error
AND EXISTS(
  SELECT SUM(counter) visit_count_total
  FROM myDataset.pageviews
  WHERE user_id = users.id
  AND visit_date >= DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY)
  AND visit_path = '/contact'
  HAVING visit_count_total >= 1
  LIMIT 1
)

LIMIT 100

Note: this query is generated from a dashboard UI, so it could have multiple filter conditions.

It's not obvious to me how to support the multiple subqueries w/ different paths/counts with a JOIN.

1

1 Answers

1
votes

Since you want to filter for the SUM of the subqueries to be larger than certain number, you could try this:

#standardSQL
WITH mytable
AS (
  SELECT *, title account_id, title id, title user_id, views counter
  FROM `fh-bigquery.wikipedia_v3.pageviews_2019` 
  WHERE datehour='2019-01-01 01:00:00' 
  AND wiki='br'
)

SELECT account_id
FROM `mytable` a
WHERE account_id='Avignon'
AND (
    SELECT SUM(counter)
    FROM `mytable` b
    WHERE a.user_id = b.id
) > 10

Basically a subquery that returns the SUM, and in the outer query compare for the desired minimum value.