3
votes

I've seen this question, which is almost exactly what I want. But I can't get it working on Bigquery with standard SQL because BQ doesn't allow user defined variables.

Note - I have an arbitrary number of groups, so UNIONing all of them as per the first answer in the linked question is not feasible.

The following is the simplest possible example, though any solution should be able to scale to however many n top results are needed:

Given a table like that below, with person, group, and age columns, how would you get the 2 oldest people in each group? (Ties within groups should not yield more results, but give the first 2 in any order)

+--------+-------+-----+
| Person | Group | Age |
+--------+-------+-----+
| Bob    | 1     | 32  |
| Jill   | 1     | 34  |
| Shawn  | 1     | 42  |
| Jake   | 2     | 29  |
| Paul   | 2     | 36  |
| Laura  | 2     | 39  |
+--------+-------+-----+

Desired result set:

+--------+-------+-----+
| Shawn  | 1     | 42  |
| Jill   | 1     | 34  |
| Laura  | 2     | 39  |
| Paul   | 2     | 36  |
+--------+-------+-----+

One answer in the linked question mentions using ROW_NUMBER which does exist, but I can't work out how to restart the number for each group.

2

2 Answers

8
votes

This is row_number():

select t.*
from (select t.*,
             row_number() over (partition by group order by age desc) as seqnum
      from t
     ) t
where seqnum <= 2;

row_number() is an ANSI standard window function. It is available in most databases. In general, I would suggest that you look more for solutions using Postgres rather than MySQL for solving problems in BQ (if you can't find a BQ resource itself).

6
votes

You can use ARRAY_AGG with a limit. For example,

#standardSQL
SELECT
  `Group`,
  ARRAY_AGG(STRUCT(Person, Age)
            ORDER BY Age DESC LIMIT 2) AS oldest_people
FROM People
GROUP BY `Group`;