I'm looking to optimize a query to get active accounts per interval (day, week, etc) where the active is defined on a start and end date for the account. Layout is as below:
accounts:
account_id Int
subscription_start Timestamp
subscription_end Timestamp -- null if still active
I tried several approaches, the fastest one being using generate_array to create arrays for active days on the fly and then unnesting them (about 2 minutes)
with nested_dates as (
select GENERATE_DATE_ARRAY(DATE(subscription_start), IFNULL(date(subscription_end), current_date()), INTERVAL 1 DAY) as dates
from `accounts`
),
all_dates as (
select date_item from nested_dates, UNNEST(dates) as date_item
)
select date_item, count(1) from all_dates group by date_item
I also used a subselect per day, which used to perform quite well in similar situations 6-9 months ago when BigQuery still had the performance tiers and gave more lee way with CPU. But it seems to be much stricter / less efficient now as they removed the higher computing tier pricing. This execution takes about 12 minutes.
select
day,
(select
count(1)
from `accounts` where
subscription_start <= day and
(subscription_end is null or subscription_end >= day)
)
from unnest(
generate_date_array(date('2015-06-01'), current_date(), interval 1 day)
) AS day
I also did performance tests on cross join of accounts with generates dates and a conditional sum (fails after 3300 seconds because of CPU limit exceeded).
What works very fast is of course when I generate a materialized table of all days for each account where it has been active, then queries finish within 6-10 seconds, if I'd implement partitioning probably even faster (e.g. by using the with from first query "all_dates".
select date(day_active), count(1) from `account_all_dates`
Now the question: Is there a way to get the performance of the materialized cross join without actually blowing up the data first and materializing it, i.e. in real time without the overhead.
I was trying with some analytic function, but could not find something that could do this.