Suppose I have two tables: intervals
contains index intervals (its columns are i_min
and i_max
) and values
contains indexed values (with columns i
and x
). Here's an example:
values: intervals:
+---+---+ +-------+-------+
| i | x | | i_min | i_max |
+-------+ +---------------+
| 1 | 1 | | 1 | 4 |
| 2 | 0 | | 6 | 6 |
| 3 | 4 | | 6 | 6 |
| 4 | 9 | | 6 | 6 |
| 6 | 7 | | 7 | 9 |
| 7 | 2 | | 12 | 17 |
| 8 | 2 | +-------+-------+
| 9 | 2 |
+---+---+
I want to sum the values of x for each interval:
result:
+-------+-------+-----+
| i_min | i_max | sum |
+---------------------+
| 1 | 4 | 13 | // 1+0+4+9
| 6 | 6 | 7 |
| 6 | 6 | 7 |
| 6 | 6 | 7 |
| 7 | 9 | 6 | // 2+2+2
| 12 | 17 | 0 |
+-------+-------+-----+
In some SQL engines, this could be done using:
SELECT
i_min,
i_max,
(SELECT SUM(x)
FROM values
WHERE i BETWEEN intervals.i_min AND intervals.i_max) AS sum_x
FROM
intervals
except that type of query is not allowed by BigQuery ("Subselect not allowed in SELECT clause." or "LEFT OUTER JOIN cannot be used without a condition that is an equality of fields from both sides of the join." depending on the syntax used).
There must be a way to do this with window functions, but I can't figure out how — all examples I've seen have the partition as part of the table. Is there an option that doesn't use CROSS JOIN? If not, what's the most efficient way to do this CROSS JOIN?
Some notes on my data:
- Both tables contain many (10⁸-10⁹) rows.
- There might be repetitions in
intervals
, not ini
. - But two intervals in
intervals
are either the same, either entirely disjoint (no overlaps). - The union of all intervals is typically close to the set of all values of
i
(so it forms a partition of this space). - Intervals might be large (say, i_max-i_min < 10⁶).