I have two tables setup in Bigquery for tracking a users and daily page_views:
users
| id | email |
page_views
| user_id | visit_date | visit_path | counter |
Subqueries work great (when not aggregating):
SELECT id
FROM myDataset.users
WHERE account_id = 'ACCOUNT_ID'
-- this works great
AND EXISTS(
SELECT 1
FROM myDataset.pageviews
WHERE user_id = users.id
AND visit_date >= DATE_SUB(CURRENT_DATE(), INTERVAL 1 WEEK)
AND visit_path = '/contact'
AND counter >= 2
LIMIT 1
)
However, when creating queries that will SUM the count of page views over a period of time I'm seeing the error:
Correlated subqueries that reference other tables are not supported unless they can be de-correlated, such as by transforming them into an efficient JOIN.
The goal is to write a query like this:
Show me all the users who have visited the pricing page 10 times in the last week AND the contact page 1 time in the last day
SELECT id
FROM myDataset.users
WHERE account_id = 'ACCOUNT_ID'
-- Correlated subqueries error
AND EXISTS(
SELECT SUM(counter) visit_count_total
FROM myDataset.pageviews
WHERE user_id = users.id
AND visit_date >= DATE_SUB(CURRENT_DATE(), INTERVAL 1 WEEK)
AND visit_path = '/pricing'
HAVING visit_count_total >= 10
LIMIT 1
)
-- Correlated subqueries error
AND EXISTS(
SELECT SUM(counter) visit_count_total
FROM myDataset.pageviews
WHERE user_id = users.id
AND visit_date >= DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY)
AND visit_path = '/contact'
HAVING visit_count_total >= 1
LIMIT 1
)
LIMIT 100
Note: this query is generated from a dashboard UI, so it could have multiple filter conditions.
It's not obvious to me how to support the multiple subqueries w/ different paths/counts with a JOIN.