2
votes

With the legacy BigQuery syntax, we have to use the exact_count_distinct function if we want to have the exact number of distinct values for a field.

With the Standard SQL 2011 syntax, I wonder if "count(distinct myfield)" will always return the exact number of distinct values if I don't select the 'Use Legacy SQL' option.

2

2 Answers

7
votes

COUNT(DISTINCT input) gives an exact count in standard SQL.

One important distinction is that COUNT(DISTINCT input) is more scalable than EXACT_COUNT_DISTINCT(input) in legacy BigQuery SQL, so in general the performance will be better and you are less likely to encounter resource exceeded errors.

You can read about other differences between legacy and standard SQL in the migration guide.

7
votes

Based on documentation for APPROX_COUNT_DISTINCT (with reading in between lines) :

COUNT(DISTINCT input) - exact count
APPROX_COUNT_DISTINCT(input) - approximate result