3
votes

I have a table with 2157 records, let's say with 3 columns (A,B,C) and I know that in column A there are 2154 different values.

Using Tableau Desktop (and its own fenctions) connected to BigQuery, I get these results:

  • Count(A) --> 2157
  • CountD(A) --> 2180 (more than Count!!!)

If i run the same calculations using queries in Big Query console, I get:

  • Count --> 2157
  • CountD --> 2154 (right)

How can this be possible?

By the way, if use a CSV file of the same table and I use it as datasource using Tableau Public (no Desktop), I get correct results as in Big Query.

Thanks in advance

2
Any chance that "A" could be null in BigQuery? The COUNTD function in Tableau doesn't count nulls even if B and C have values for these rows.astro11
no, all values ar not null... but there's no way (imho) that a count distinct returns more record than a countFabio Fantoni
Sounds like a bug to me. Consider reporting it to Tableau.Victor K.

2 Answers

1
votes

If you use the DISTINCT keyword, the function returns the number of distinct values for the specified field. Note that the returned value for DISTINCT is a statistical approximation and is not guaranteed to be exact - the documentation is also clear about this.

If you require greater accuracy from COUNT(DISTINCT), you can specify a second parameter, n, which gives the threshold below which exact results are guaranteed. By default, n is 1000, but if you give a larger n, you will get exact results for COUNT(DISTINCT) up to that value of n. However, giving larger values of n will reduce scalability of this operator and may substantially increase query execution time or cause the query to fail.

To compute the exact number of distinct values, use EXACT_COUNT_DISTINCT. Or, for a more scalable approach, consider using GROUP BY on the relevant field(s) and then applying COUNT(*). The GROUP BY approach is more scalable but might incur a slight up-front performance penalty.

https://cloud.google.com/bigquery/query-reference#aggfunctions

0
votes

Try using a calculated field with this formula - RAWSQLAGG_INT("count(unique(%1))",['Your column'])

Slows down performance, so only use when you really need it.