0
votes

I have tried PostgreSQL:count distinct (col1,col2,col3,col4,col5) in BigQuery :Count distinct concat(col1,col2,col3,col4,col5)

My scenario is I need to get same result as PostgreSQL in BigQuery

Though this scenario works on 3 columns ,I am not getting same value as PostgreSQL for 5 columns.

sample query:

 select col1,
count(distinct concat((col1,col2,col3,col4,col5)
from table A
group by col1

when I remove distinct and concat, simple count(col1,col2,col3,col4,col5) gives exact value as populated in PostgreSQL. But i need to have distinct of these columns. Is there any way to achieve this? and does bigquery concat works differently?

2
Please show example input and output. - jjanes
Occam's razor: the data is different. Do you know the answer in advance? If not did you prove the data in the separate databases is actually the same? - Belayer

2 Answers

1
votes

Below few options for BigQuery Standard SQL

#standardSQL
SELECT col1,
  COUNT(DISTINCT TO_JSON_STRING((col1,col2,col3,col4,col5)))
FROM A
GROUP BY col1   

OR

#standardSQL
SELECT col1,
  COUNT(DISTINCT FORMAT('%T', [col1,col2,col3,col4,col5]))
FROM A
GROUP BY col1
0
votes

An alternative suitable for the many databases that don't support that form of COUNT DISTINCT:

SELECT COUNT(*)
FROM (
  SELECT DISTINCT Origin, Dest, Reporting_Airline 
  FROM `fh-bigquery.flights.ontime_201908` 
  WHERE FlightDate_year = "2018-01-01" 
)

My guess on why CONCAT didn't work in your sample: Do you have any null columns?