0
votes

We have a bigquery table that looks like this:

with
  my_data as (
    select 1 as num1, 32 as num2, 43 as num3, 'a' as letter union all
    select 2 as num1, 21 as num2, 45 as num3, 'a' as letter union all
    select 3 as num1, 99 as num2, 47 as num3, 'a' as letter union all
    select 4 as num1, 83 as num2, 48 as num3, 'a' as letter union all
    select 5 as num1, 55 as num2, 49 as num3, 'a' as letter union all
    select 6 as num1, 35 as num2, 51 as num3, 'b' as letter union all
    select 7 as num1, 94 as num2, 52 as num3, 'b' as letter union all
    select 8 as num1, 17 as num2, 55 as num3, 'b' as letter union all
    select 9 as num1, 33 as num2, 56 as num3, 'b' as letter union all
    select 10 as num1, 81 as num2, 37 as num3, 'b' as letter union all
    select 11 as num1, 42 as num2, 38 as num3, 'a' as letter union all
    select 12 as num1, 26 as num2, 39 as num3, 'a' as letter union all
    select 13 as num1, 92 as num2, 41 as num3, 'a' as letter union all
    select 14 as num1, 38 as num2, 43 as num3, 'a' as letter union all
    select 15 as num1, 31 as num2, 46 as num3, 'a' as letter union all
    select 16 as num1, 53 as num2, 48 as num3, 'b' as letter union all
    select 17 as num1, 49 as num2, 49 as num3, 'b' as letter union all
    select 18 as num1, 71 as num2, 51 as num3, 'b' as letter union all
    select 19 as num1, 67 as num2, 52 as num3, 'b' as letter union all
    select 20 as num1, 62 as num2, 54 as num3, 'b' as letter
  )

letter is the column to group by, and num1, num2, num3 are the 3 columns for which we'd like to compute 0 - 100 %iles. To be more clear, we'd like to return a table with 202 rows, and with the columns letter pctile value1 value2 value3. letter is a (101 times) and b (101) times, pctile goes from 0,1,2,3... 100,0,1,2,3... 100, and value1 value2 value3 are the values that correspond with the 0th, 1st, 2nd, 3rd, 4th, etc. percentiles (for each group / letter).

I previously posted this very similar question here - Compute percentiles by group in BigQuery - where a helpful solution was provided. However, this solution was for a base case where 0 - 100 %ile rows were computed for only a single column. Now, and in our data's real example, we are dealing with multiple columns. The solution from the previous post, when expanded to our new data with 3 columns, does not work.

SELECT letter, pctile, value1, value2, value3
FROM (
  SELECT
    letter,
    APPROX_QUANTILES(num1, 100) AS value1,
    APPROX_QUANTILES(num2, 100) AS value2,
    APPROX_QUANTILES(num3, 100) AS value3,
  FROM my_data
  GROUP BY letter
) as t, 
t.value1 WITH OFFSET AS pctile

this does technically return 202 rows, however the values in each row for value2 and value3 are not individual values, but rather appear to be entire arrays of length == 100. I've tried different things to get the desired result (202 rows, each row has the correct individual value for value1 value2 value3), with no success. Is this possible to do?

1

1 Answers

1
votes

Try below

SELECT letter, pctile, value1, value2, value3
FROM (
  SELECT
    letter,
    APPROX_QUANTILES(num1, 100) AS value1,
    APPROX_QUANTILES(num2, 100) AS value2,
    APPROX_QUANTILES(num3, 100) AS value3,
  FROM my_data
  GROUP BY letter
) as t
,t.value1 WITH OFFSET AS pctile
,t.value2 WITH OFFSET AS pctile2
,t.value3 WITH OFFSET AS pctile3
WHERE pctile = pctile2
AND pctile = pctile3