We have a bigquery table that looks like this:
with
my_data as (
select 1 as num1, 32 as num2, 43 as num3, 'a' as letter union all
select 2 as num1, 21 as num2, 45 as num3, 'a' as letter union all
select 3 as num1, 99 as num2, 47 as num3, 'a' as letter union all
select 4 as num1, 83 as num2, 48 as num3, 'a' as letter union all
select 5 as num1, 55 as num2, 49 as num3, 'a' as letter union all
select 6 as num1, 35 as num2, 51 as num3, 'b' as letter union all
select 7 as num1, 94 as num2, 52 as num3, 'b' as letter union all
select 8 as num1, 17 as num2, 55 as num3, 'b' as letter union all
select 9 as num1, 33 as num2, 56 as num3, 'b' as letter union all
select 10 as num1, 81 as num2, 37 as num3, 'b' as letter union all
select 11 as num1, 42 as num2, 38 as num3, 'a' as letter union all
select 12 as num1, 26 as num2, 39 as num3, 'a' as letter union all
select 13 as num1, 92 as num2, 41 as num3, 'a' as letter union all
select 14 as num1, 38 as num2, 43 as num3, 'a' as letter union all
select 15 as num1, 31 as num2, 46 as num3, 'a' as letter union all
select 16 as num1, 53 as num2, 48 as num3, 'b' as letter union all
select 17 as num1, 49 as num2, 49 as num3, 'b' as letter union all
select 18 as num1, 71 as num2, 51 as num3, 'b' as letter union all
select 19 as num1, 67 as num2, 52 as num3, 'b' as letter union all
select 20 as num1, 62 as num2, 54 as num3, 'b' as letter
)
letter
is the column to group by, and num1, num2, num3
are the 3 columns for which we'd like to compute 0 - 100 %iles. To be more clear, we'd like to return a table with 202 rows, and with the columns letter pctile value1 value2 value3
. letter
is a
(101 times) and b
(101) times, pctile
goes from 0,1,2,3... 100,0,1,2,3... 100
, and value1 value2 value3
are the values that correspond with the 0th, 1st, 2nd, 3rd, 4th, etc. percentiles (for each group / letter).
I previously posted this very similar question here - Compute percentiles by group in BigQuery - where a helpful solution was provided. However, this solution was for a base case where 0 - 100 %ile rows were computed for only a single column. Now, and in our data's real example, we are dealing with multiple columns. The solution from the previous post, when expanded to our new data with 3 columns, does not work.
SELECT letter, pctile, value1, value2, value3
FROM (
SELECT
letter,
APPROX_QUANTILES(num1, 100) AS value1,
APPROX_QUANTILES(num2, 100) AS value2,
APPROX_QUANTILES(num3, 100) AS value3,
FROM my_data
GROUP BY letter
) as t,
t.value1 WITH OFFSET AS pctile
this does technically return 202 rows, however the values in each row for value2
and value3
are not individual values, but rather appear to be entire arrays of length == 100. I've tried different things to get the desired result (202 rows, each row has the correct individual value for value1 value2 value3
), with no success. Is this possible to do?