2
votes

I have a table t1 with structure:

  • id: integer
  • name: repeated (string)

I have entries with multiple names:

  • 123; name1,name2
  • 124; name1,name3,name4,name5
  • 125; name1,name4,name7

I want to return lines (unique) that have name equal name1 and name4 which would return line 2 and 3, id 124 and 125

Bigquery automatically flatens results. But that makes hard to return rows that include multiple

And must do this with LegacySQL because of some constraints. I've tried:

SELECT _id AS _id, GROUP_CONCAT_UNQUOTED(name) AS name where name like "%name1%" and name like "%name4%"

Tried with = as well and not working.

Also this one returns all lines because of name1:

name IN ("name1", "name4")

( name = "name1" or name = "name4")

1

1 Answers

2
votes

must do this with LegacySQL because of some constraints

Below is for BigQuery Legacy SQL

Option #1

#legacySQL
SELECT id, NEST(name) name
FROM [project:dataset.table] 
GROUP BY id
HAVING SUM(name IN ('name1', 'name4')) = 2

and set destination table with no Flatten Results set to false (unchecked in UI)

Option #2

#legacySQL
SELECT id, GROUP_CONCAT(name) name
FROM [project:dataset.table] 
GROUP BY id
HAVING SUM(name IN ('name1', 'name4')) = 2

with result (assuming that table has sampled data from your question)

Row id  name     
1   124 name1,name3,name4,name5  
2   125 name1,name4,name7    

Note: above options assume there is no duplicate names in name array, which looks like most likely your case

Option #3 - this is actually for BigQuery Standard SQL - assuming you can migrate your stuff to Standard SQL

#standardSQL
SELECT id, name
FROM `project.dataset.table`
WHERE 2 = (SELECT COUNT(DISTINCT item) FROM UNNEST(name) item WHERE item IN ('name1','name4'))

with result

Row id  name     
1   124 name1    
        name3    
        name4    
        name5    
2   125 name1    
        name4    
        name7