1
votes

I have a table that looks like this:

+----+------+------+------+------+------+
| id | col1 | col2 | col3 | col4 | col5 |
+----+------+------+------+------+------+
| a  |    1 | null | null | null | null |
| b  |    1 | 2    | 3    | 4    | null |
| c  |    1 | 2    | 3    | 4    | 5    |
| d  |    2 | 1    | 7    | null | 4    |
+----+------+------+------+------+------+

I want to create an aggregated table where for each id I want an array that contains non null value from all the other columns. The output should look like this:

+-----+-------------+
| id  |   agg_col   |
+-----+-------------+
| a   | [1]         |
| b   | [1,2,3,4]   |
| c   | [1,2,3,4,5] |
| d   | [2,1,7,4]   |
+-----+-------------+

Is it possible to produce the output using bigquery standard sql?

1

1 Answers

3
votes

Below is not super generic solution, but works for your specific example that you provided - id is presented with alphanumeric (not starting with digit) and rest of columns are numbers - integers

#standardSQL
SELECT id,
  ARRAY(SELECT * FROM UNNEST(REGEXP_EXTRACT_ALL(TO_JSON_STRING(t), r':(\d*)')) col WHERE col != '') AS agg_col_as_array,
  CONCAT('[', ARRAY_TO_STRING(ARRAY(SELECT * FROM UNNEST(REGEXP_EXTRACT_ALL(TO_JSON_STRING(t), r':(\d*)')) col WHERE col != ''), ','), ']') AS agg_col_as_string
FROM `project.dataset.table` t

You can test, play with above using sample data from your question as below

#standardSQL
WITH `project.dataset.table` AS (
  SELECT 'a' id, 1 col1, NULL col2, NULL col3, NULL col4, NULL col5 UNION ALL
  SELECT 'b', 1, 2, 3, 4, NULL UNION ALL
  SELECT 'c', 1, 2, 3, 4, 5 UNION ALL
  SELECT 'd', 2, 1, 7, NULL, 4
)
SELECT id,
  ARRAY(SELECT * FROM UNNEST(REGEXP_EXTRACT_ALL(TO_JSON_STRING(t), r':(\d*)')) col WHERE col != '') AS agg_col_as_array,
  CONCAT('[', ARRAY_TO_STRING(ARRAY(SELECT * FROM UNNEST(REGEXP_EXTRACT_ALL(TO_JSON_STRING(t), r':(\d*)')) col WHERE col != ''), ','), ']') AS agg_col_as_string
FROM `project.dataset.table` t
-- ORDER BY id   

with result as

Row id  agg_col_as_array    agg_col_as_string    
1   a   1                   [1]  
2   b   1                   [1,2,3,4]    
        2        
        3        
        4        
3   c   1                   [1,2,3,4,5]  
        2        
        3        
        4        
        5        
4   d   2                   [2,1,7,4]    
        1        
        7        
        4        

Do you think it is possible to do this by mentioning specific columns and then binding them into an array?

Sure, it is doable - see below

#standardSQL
WITH `project.dataset.table` AS (
  SELECT 'a' id, 1 col1, NULL col2, NULL col3, NULL col4, NULL col5 UNION ALL
  SELECT 'b', 1, 2, 3, 4, NULL UNION ALL
  SELECT 'c', 1, 2, 3, 4, 5 UNION ALL
  SELECT 'd', 2, 1, 7, NULL, 4
)
SELECT id, 
  ARRAY(
    SELECT col 
    FROM UNNEST([col1, col2, col3, col4, col5]) col 
    WHERE NOT col IS NULL
  ) AS agg_col_as_array,
CONCAT('[', ARRAY_TO_STRING(
  ARRAY(
    SELECT CAST(col AS STRING) 
    FROM UNNEST([col1, col2, col3, col4, col5]) col 
    WHERE NOT col IS NULL
  ), ','), ']') AS agg_col_as_string
FROM `project.dataset.table` t
-- ORDER BY id       

BUT ... this is not the best option you have as you need to manage and adjust number and names of columns in each case for different uses

Below solution is adjusted version of my original answer to address your latest comment - Actually the sample was too simple. Both of my id and other columns have alphanumeric and special characters.

#standardSQL
WITH `project.dataset.table` AS (
  SELECT 'a' id, 1 col1, NULL col2, NULL col3, NULL col4, NULL col5 UNION ALL
  SELECT 'b', 1, 2, 3, 4, NULL UNION ALL
  SELECT 'c', 1, 2, 3, 4, 5 UNION ALL
  SELECT 'd', 2, 1, 7, NULL, 4
)
SELECT id,
  ARRAY(
    SELECT col 
    FROM UNNEST(REGEXP_EXTRACT_ALL(TO_JSON_STRING(t), r':(.*?)(?:,|})')) col WITH OFFSET 
    WHERE col != 'null' AND OFFSET > 0
  ) AS agg_col_as_array,
  CONCAT('[', ARRAY_TO_STRING(
    ARRAY(
      SELECT col 
      FROM UNNEST(REGEXP_EXTRACT_ALL(TO_JSON_STRING(t), r':(.*?)(?:,|})')) col WITH OFFSET 
      WHERE col != 'null' AND OFFSET > 0
    ), ','), ']') AS agg_col_as_string
FROM `project.dataset.table` t
-- ORDER BY id   

both with same result as before

Row id  agg_col_as_array    agg_col_as_string    
1   a   1                   [1]  
2   b   1                   [1,2,3,4]    
        2        
        3        
        4        
3   c   1                   [1,2,3,4,5]  
        2        
        3        
        4        
        5        
4   d   2                   [2,1,7,4]    
        1        
        7        
        4