0
votes

From table containing department, person and url. Want to select sample url for every combination of department and individual person.

select dept,person,FIRST_VALUE(url) OVER (PARTITION BY dept,person ORDER BY url) from cs.dept_person_url 

Above query fails with Error: Resources exceeded during query execution.

READ  
$1, $2, $3
FROM __SHUFFLE0
SORT  
$1 ASC, $2 ASC, $3 ASC
ANALYTIC FN  
FIRST_VALUE($13) OVER (PARTITION BY $11, $12 ORDER BY $13 ASC RANGE BETWEEN UNBOUNDED PRECEDING AND [...]
WRITE  
$14, $15, $10
TO __SHUFFLE1

The stage where it fails has:

Input Rows: 54,344,863 
Output Rows: 48,536,071

Please suggest any alternative approach which might work.

2
It must be order by clause as part of the window function but dont see any other option to get the sample url for all groups.axrd

2 Answers

2
votes

Try this:

#standardSQL
select dept, person, ANY_VALUE(url)
from cs.dept_person_url 
group by dept, person

This since you want any URL, and asking for the OVER(ORDER BY) forces a sort over all URLs in a partition - and that's too much work for just getting a sample for each.

2
votes

Is there a reason

SELECT dept, person, FIRST(url)
FROM cs.dept_person_url
GROUP BY dept, person

Wouldn't provide what you're looking for? Caveat - I'm used to using Google Big Query's legacy-sql rather than standard-sql