0
votes

Here is my query that works:

SELECT a_name, w_name, page_url, SUM(IF(result = 'WIN', 1, 0)) as Impressions,

from TABLE_DATE_RANGE(v3_data.v3_,TIMESTAMP('2015-08-05'),TIMESTAMP('2015-08-07'))

WHERE server ="server1" and w_id IN (1096, 1105, 1136,1153,1189,1209)

GROUP EACH BY 1,2,3

ORDER BY Impressions DESC

limit 1000000

Now my issue is I dont want to use a limit.

What I would like to do is choose the top 1000 results from each w_id, ordered by the impressions.

There are multiple w_ids and this query will be extended to multiple servers too.

originally I thought id try:

select top 1000 * from [code above] group by 1,2,3

but top doesnt work like that in big query and if I did that it wouldnt choose the top 1000 for each w_id it would do it over the whole set, so I could get none of the results for w_id 1209 if they are all relatively small.

I hope this is enough information, I've been given a lot of stick on here for being to vague, but im trying to give as much detail and be as logical as I can. Thanks in advance

1

1 Answers

1
votes

BigQuery supports ROW_NUMBER() which is the function you need to do this easily.

You do need to include w_id in the group by, but I think the following should do what you want:

SELECT t.*
FROM (SELECT a_name, w_name, page_url,
             SUM(IF(result = 'WIN', 1, 0)) as Impressions,
             ROW_NUMBER() OVER (PARTITION BY w_id
                                ORDER BY SUM(IF(result = 'WIN', 1, 0)) DESC
                               ) as seqnum
      from TABLE_DATE_RANGE(v3_data.v3_, TIMESTAMP('2015-08-05'), TIMESTAMP('2015-08-07'))
      WHERE server = 'server1' and w_id IN (1096, 1105, 1136, 1153, 1189, 1209)
      GROUP EACH BY 1, 2, 3, w_id
     ) t
WHERE seqnum <= 1000;

If BigQuery doesn't support row_number() in an aggregation, you might need an additional layer of aggregation.