0
votes

I have a BigQuery table that includes an String array field. For some records, the array can hold duplicate string values.

Is is possible in a BigQuery UNNEST clause to filter out the duplicates so that the UNNEST only returns distinct array string values?

2

2 Answers

1
votes

Below is for BigQuery Standard SQL

#standardSQL
SELECT * REPLACE(ARRAY(SELECT DISTINCT el FROM t.arr AS el) AS arr)
FROM `project.dataset.table` t    

You can test, play with above using dummy data as in below example

#standardSQL
WITH `project.dataset.table` AS (
  SELECT 1 col1, 2 col2, ["a", "a", "b", "e", "a", "c", "b", "a"] arr, 3 col3 UNION ALL
  SELECT 4, 5, ["x", "y", "z"], 5
)
SELECT * REPLACE(ARRAY(SELECT DISTINCT el FROM t.arr AS el) AS arr)
FROM `project.dataset.table` t   

with output

Row col1    col2    arr col3     
1   1       2       a   3    
                    b        
                    e        
                    c        
2   4       5       x   5    
                    y        
                    z        
0
votes

There are many ways to do this. Since you didn't specify a desired input and output, I'll arbitrarily choose one.

Usings ARRAY_AGG(DISTINCT):

WITH data AS (
  SELECT 1 id, ["a", "a", "b", "e", "a", "c", "b", "a"] strings
)


SELECT id, ARRAY_AGG(DISTINCT string) strings
FROM data, UNNEST(strings) string
GROUP BY id

enter image description here