2
votes

I'm using standard SQL on BigQuery to create a new table based on certain conditions within an existing table. I have multiple WHEN clauses to support this (as there are several different conditions I'm checking for). What I now want to do is have multiple THEN clauses within those WHEN statements, as I'm aiming to add more than one column.

Specifically, I want to add a concatenation of two existing text fields as one field, and then an aggregated array of three existing fields as one field:

CASE WHEN
    # all three match
    one_x1 = two_x1 = three_x1 THEN CONCAT( object1_name, ", ", object2_name, ", ", object3_name ) AND ARRAY_AGG(STRUCT(score_one, score_two, score_three))
    # one and two match
      WHEN one_x1 = two_x1 THEN CONCAT( object1_name, ", ", object2_name ) AND ARRAY_AGG(STRUCT(score_one, score_two))
    # one and three match
      WHEN one_x1 = three_x1 THEN CONCAT( object1_name, ", ", object3_name ) AND ARRAY_AGG(STRUCT(score_one, score_three))
    # two and three match
      WHEN two_x1 = three_x1 THEN CONCAT( object2_name, ", ", object3_name ) AND ARRAY_AGG(STRUCT(score_two, score_three))
   ELSE
    NULL
 END

It is the 'AND ARRAY_AGG(STRUCT(xxxxx))' part that does not work, I have also tried using commas to separate the THEN clauses.

Is the only option to repeat the same case statement to separate the THEN clauses individually?

Sample data: sample_data desired outcome for row 1 of sample data: here

2

2 Answers

2
votes

Below is for BigQuery Standard SQL

First, let's correct your initial query so that it actually produces expected result

#standardSQL
SELECT id, 
CASE 
    WHEN one_x1 = two_x1 AND one_x1 = three_x1 THEN CONCAT( object1_name, ", ", object2_name, ", ", object3_name )
    WHEN one_x1 = two_x1 THEN CONCAT( object1_name, ", ", object2_name )
    WHEN one_x1 = three_x1 THEN CONCAT( object1_name, ", ", object3_name )
    WHEN two_x1 = three_x1 THEN CONCAT( object2_name, ", ", object3_name )
    ELSE NULL
END AS field1,
CASE 
    WHEN one_x1 = two_x1 AND one_x1 = three_x1 THEN [score_one, score_two, score_three]
    WHEN one_x1 = two_x1 THEN [score_one, score_two]
    WHEN one_x1 = three_x1 THEN [score_one, score_three]
    WHEN two_x1 = three_x1 THEN [score_two, score_three]
    ELSE NULL
END AS field2
FROM `project.dataset.table`

If to apply to sample data from your question - result is

Row id  field1                  field2   
1   1   Dog, Animal             0.82     
                                0.72     
2   2   Horse, Animal, Bird     0.76     
                                0.73     
                                0.9  
3   3   Dog, Animal, Chicken    0.67     
                                0.75     
                                0.65     
4   4   Bird, Chicken           0.87     
                                0.86       

Next, as I understand, you want to avoid repeating same conditions again and again in your CASE - for this - you can use below trick

#standardSQL

SELECT id, fields.* FROM (
  SELECT id, 
  CASE 
      WHEN one_x1 = two_x1 AND one_x1 = three_x1 THEN 
        STRUCT(CONCAT( object1_name, ", ", object2_name, ", ", object3_name) AS field1, [score_one, score_two, score_three] AS field2)
      WHEN one_x1 = two_x1 THEN 
        STRUCT(CONCAT( object1_name, ", ", object2_name ) AS field1, [score_one, score_two] AS field2)
      WHEN one_x1 = three_x1 THEN 
        STRUCT(CONCAT( object1_name, ", ", object3_name ) AS field1, [score_one, score_three] AS field2)
      WHEN two_x1 = three_x1 THEN 
        STRUCT(CONCAT( object2_name, ", ", object3_name ) AS field1, [score_two, score_three] AS field2)
      ELSE NULL
  END AS fields
  FROM `project.dataset.table`
)

obviously with the same output ...

And, finally, as yet another option for yo - you can eliminate all those case/when/then with below approach

#standardSQL
SELECT id, 
  (SELECT STRING_AGG(object) FROM UNNEST(objects) object WITH OFFSET
    JOIN UNNEST(pos) OFFSET USING(OFFSET)
  ) field1,
    (SELECT ARRAY_AGG(score) FROM UNNEST(scores) score WITH OFFSET
    JOIN UNNEST(pos) OFFSET USING(OFFSET)
  ) field2
FROM (
  SELECT id, 
    [object1_name, object2_name, object3_name] objects,
    [score_one, score_two, score_three] scores,
    (SELECT ARRAY_AGG(OFFSET) 
      FROM UNNEST([one_x1, two_x1, three_x1]) x WITH OFFSET 
      GROUP BY x HAVING COUNT(1) > 1
    ) pos
  FROM `project.dataset.table`
)

again with the same output

-1
votes

If you are looking to generate two separated columns in the resultset, then you would need to create two CASE statements, like:

CASE 
    WHEN one_x1 = two_x1 = three_x1 THEN CONCAT( object1_name, ", ", object2_name, ", ", object3_name )
    WHEN one_x1 = two_x1 THEN CONCAT( object1_name, ", ", object2_name )
    WHEN one_x1 = three_x1 THEN CONCAT( object1_name, ", ", object3_name )
    WHEN two_x1 = three_x1 THEN CONCAT( object2_name, ", ", object3_name )
    ELSE NULL
END AS field1,
CASE
    WHEN one_x1 = two_x1 = three_x1 THEN ARRAY_AGG(STRUCT(score_one, score_two, score_three))
    WHEN one_x1 = two_x1 THEN ARRAY_AGG(STRUCT(score_one, score_two))
    WHEN one_x1 = three_x1 THEN ARRAY_AGG(STRUCT(score_one, score_three))
    WHEN two_x1 = three_x1 THEN ARRAY_AGG(STRUCT(score_two, score_three))
    ELSE NULL
 END AS field2