I am trying to get unique values from multiple columns but since the datastructure is an array I can't directly do DISTINCT on all columns. I am using UNNEST() for each column and performing a UNION ALL for each column.
My idea is to create a UDF so that I can simply give the column name each time instead of performing the select every time.
I would like to replace this Query with a UDF since there are many feature columns and I need to do many UNION ALL.
SELECT DISTINCT user_log as unique_value,
'user_log' as feature
FROM `my_table`
left join UNNEST(user_Log) AS user_log
union all
SELECT DISTINCT page_name as unique_value,
'user_login_page_name' as feature
FROM `my_table`
left join UNNEST(PageName) AS page_name
order by feature;
My UDF
CREATE TEMP FUNCTION get_uniques(feature_name ARRAY<STRING>, feature STRING)
AS (
(SELECT DISTINCT feature as unique_value,
'feature' as feature
FROM `my_table`
left join UNNEST(feature_name) AS feature));
SELECT get_uniques(user_Log, log_feature);
However the UDF to select the column doesnt really work and gives the error
Scalar subquery cannot have more than one column unless using SELECT AS STRUCT to build STRUCT values; failed to parse CREATE [TEMP] FUNCTION statement at [8:1]
There is probably a better way of doing this. Appreciate your help.