I would like to run the following query in BigQuery, ideally as efficiently as possible. The idea is that I have all of these rows corresponding to tests (taken daily) by millions of users and I want to determine, of the users who have been active for over a year, how much each user has improved.
"Improvement" in this case is the average of the first N subtracted from the last N.
In this example, N is 30. (I've also added in the where cnt >= 100 part because I don't want to consider users who took a test a long time ago and just came back to try once more.)
select user_id,
avg(score) filter (where seqnum_asc <= 30) as first_n_avg,
avg(score) filter (where seqnum_desc <= 30) as last_n_avg
from (select *,
row_number() over (partition by user_id order by created_at) as seqnum_asc,
row_number() over (partition by user_id order by created_at desc) as seqnum_desc,
count(*) over (partition by user_id) as cnt
from tests
) t
where cnt >= 100
group by user_id
having max(created_at) >= min(created_at) + interval '1 year';