Make binary values out of the top 2 column names that has the highest values out of many column names in SQL

Question

I have a table like the following:

ID	col1	col2	col3	col4
A	100	400	30	800
B	600	50	500	75

and I want a query where I can return something like

ID	col1	col2	col3	col4
A	0	1	0	1
B	1	0	1	0

Except, I want the logic to look at each row, and for each row find which two columns have the top 2 values. I imagine there may be a few CTEs or subqueries involved. Even getting to a CTE that would produce the following outcome would be good enough, but don't know how to get to this CTE:

ID	top_2_col_name
A	col2
A	col4
B	col1
B	col3

Is there a way to do aggregate and window functions row-wise instead of column wise? I'm using Google's BigQuery SQL.

Gordon Linoff Gordon Linoff · Accepted Answer · 2021-01-09T14:18:48

If you want the top two values, then one method is to unnest the values and calculate the ranking and select them:

with t as (
      select 'A' as id, 100 as col1, 400 as col2,  30 as col3, 800 as col4 union all
      select 'B' as id, 600 as col1, 50  as col2, 500 as col3, 75 as col4 
     )
select * except (seqnum)
from (select t.id, col.*, row_number() over (partition by t.id order by col.val desc) as seqnum
      from t cross join
           unnest(array[struct('col1' as col, t.col1 as val),
                              struct('col2', t.col2),
                              struct('col3', t.col3),
                              struct('col4', t.col4)
                             ]
                        ) col
     ) tc
where seqnum <= 2;

This is the second form of your result set.

You can generalize this to any number of columns using a JSON trick. This produces a string and then parses the string for the columns you care about, unnests them and does similar operations:

with t as (
      select 'A' as id, 100 as col1, 400 as col2,  30 as col3, 800 as col4 union all
      select 'B' as id, 600 as col1, 50  as col2, 500 as col3, 75 as col4 
     )
select t.id, concat('col', n), val
from (select t.id, val, n, row_number() over (partition by t.id order by val desc) as seqnum
      from t cross join
           unnest(regexp_extract_all(to_json_string(t), '"col[0-9]+":([0-9]+)')) val with offset n
     ) t
where seqnum <= 2;

This can work on any number of columns. Of course, if you have a data structure like that, the values should really be stored in an array.

Make binary values out of the top 2 column names that has the highest values out of many column names in SQL

4 Answers