0
votes

I am trying to unnest a field but something is wrong with my query.

Sample data in my table

'1234', '{ "id" : "123" , "items" : [ { "quantity" : 1 , "product" : { "id" : "p1" , "categories" : [ "cat1","cat2","cat3"]  }}] }'

There are 2 fields in the dataset: row_id and parts, where parts is a dictionary object with list items (categories) in it but datatype of a parts is string. I would like the output to be individual row for each category.

This is what I have tried but I am not getting any result back.

#standardSQL
with t as (
select "1234" as row_id, '{ "id" : "123" , "items" : [ { "quantity" : 1 , "product" : { "id" : "p1" , "categories" : [ "cat1","cat2","cat3"]  }}] }' as parts 
)
select row_id, _categories
from t,
UNNEST(REGEXP_EXTRACT_ALL(JSON_EXTRACT(parts, '$.items'), r'"categories":"(.+?)"')) _categories

expected result

id, _categories
1234, cat1
1234, cat2
1234, cat3
1
The problem is that you're expecting not to have a space after the : based on your regex.Elliott Brossard
I got the same result after adding a spacesam

1 Answers

3
votes

Below is for BigQuery Standard SQL

#standardSQL
WITH t AS (
  SELECT "1234" AS row_id, '{ "id" : "123" , "items" : [ { "quantity" : 1 , "product" : { "id" : "p1" , "categories" : [ "cat1","cat2","cat3"]  }}] }' AS parts 
)
SELECT row_id, REPLACE(_categories, '"', '') _categories
FROM t, UNNEST(SPLIT(REGEXP_EXTRACT(
  JSON_EXTRACT(parts, '$.items'), 
  r'"categories":\[(.+?)]'))
) _categories 

and produces expected result

Row row_id  _categories  
1   1234    cat1     
2   1234    cat2     
3   1234    cat3      

Update

Above solution was mostly focused on fixing regexp used in extract - but not addressed more generic case of having multiple products. Below solution addresses such more generic case

#standardSQL
WITH t AS (
  SELECT "1234" AS row_id, '''{ "id" : "123" , "items" : [ 
      { "quantity" : 1 , "product" : { "id" : "p1" , "categories" : [ "cat1","cat2","cat3"]  }},
      { "quantity" : 2 , "product" : { "id" : "p2" , "categories" : [ "cat4","cat5","cat6"]  }}
    ] }''' AS parts 
)
SELECT row_id, REPLACE(category, '"', '') category
FROM t, UNNEST(REGEXP_EXTRACT_ALL(parts, r'"categories" : \[(.+?)]')) categories,
UNNEST(SPLIT(categories)) category  

with result

Row row_id  category     
1   1234    cat1     
2   1234    cat2     
3   1234    cat3     
4   1234    cat4     
5   1234    cat5     
6   1234    cat6