1
votes

Using Big Query, I am trying to query

count(distinct col), count (distinct col) over (partition by col2)

But I receive an unexpected error.

Here is the template of the query i'm trying to execute:

SELECT
          country,
          partner,
          segment_id,
          COUNT(DISTINCT pv_id) pvs,
          COUNT(DISTINCT pv_id) over(PARTITION BY country) country_total_pvs
FROM (...)
GROUP BY
          country,
          partner,
          segment_id

And the error I keep getting:

Error: Expression 'pv_id' is not present in the GROUP BY list

Without the 5th column (the analytical count), the query executes without any error.

Thoughts?

Many Thanks!

2
Sample data and desired results would explain what you want to do. As written, the query is not valid, so it doesn't really convey your intent.Gordon Linoff
Here, you can take following code for example:<br/> SELECT language, title, COUNT(DISTINCT contirbutor_username), COUNT(DISTINCT contirbutor_username) over( PARTITION BY language) FROM [bigquery-public-data:samples.wikipedia] GROUP BY language, title what I really want to have is grouping result of different dimensions.goidelg

2 Answers

1
votes

The following should work, but I'm guessing does not do what you want:

SELECT country, partner, segment_id,
       COUNT(DISTINCT pv_id) pvs,
       SUM(COUNT(DISTINCT pv_id)) OVER (PARTITION BY country) as country_total_pvs
FROM (...) q
GROUP BY country, partner, segment_id;

Instead:

SELECT country, partner, segment_id,
       COUNT(DISTINCT pv_id) pvs,
       country_total_pvs
FROM (SELECT q.*,
             COUNT(DISTINCT pv_id) OVER (PARTITION BY country) as country_total_pvs
      FROM (...) q
     ) q
GROUP BY country, partner, segment_id, country_total_pvs;
0
votes

Try below

SELECT
  a.country AS country,
  partner,
  segment_id,
  COUNT(DISTINCT pv_id) pvs,
  country_total_pvs
FROM youTable AS a
LEFT JOIN (
  SELECT country, COUNT(DISTINCT pv_id) country_total_pvs
  FROM youTable GROUP BY country
) AS b
ON a.country = b.country
GROUP BY
  country,
  partner,
  segment_id,
  country_total_pvs  

Have in mind - above will provide "correct" distinct counts for BigQuery StandardSQL, but in BigQuery Legacy SQL COUNT(DISTINCT) is a statistical approximation and is not guaranteed to be exact. You can use EXACT_COUNT_DISTINCT instead

Below is slightly optimized version

SELECT
  a.country AS country,
  partner,
  segment_id,
  pvs,
  country_total_pvs
FROM (
  SELECT
    country,
    partner,
    segment_id,
    COUNT(DISTINCT pv_id) pvs
  FROM youTable 
  GROUP BY country, partner, segment_id
) AS a
LEFT JOIN (
  SELECT country,
    COUNT(DISTINCT pv_id) country_total_pvs
  FROM youTable GROUP BY country
) AS b
ON a.country = b.country