I have a dataset with ~20M rows and I'm observing the following behavior.
The query below returns the error "Response too large to return". The 'id' field is shared among multiple records and the 'field' field has some arbitrary value for each record. I would expect that the result set should only contain 10 rows, well below the query response limit.
SELECT id, COUNT(DISTINCT field)
FROM [my.dataset]
GROUP BY id
LIMIT 10
However, when the DISTINCT keyword is removed from the COUNT aggregation function, BigQuery returns 10 results as expected.
SELECT id, COUNT(field)
FROM [my.dataset]
GROUP BY id
LIMIT 10
I don't understand why the first query returns an error and the second completes successfully. Shouldn't both queries return the same number of rows?