Querying on Bigquery repeated fields

Question

Below is the schema of my BigQuery table. I am selecting the sentence_id, store and BU_model and inserting data into another table in BigQuery. The datatypes for the new table generated are integer, repeated and repeated respectively. I want to flatten/unnest the repeated fields so that they are created as STRING fields in my second table. How could this be achieved using standard sql?

+- sentences: record (repeated)
|  |- sentence_id: integer                                                                                                                             
|  |- autodetected_language: string                                                                                                                    
|  |- processed_language: string 
|  +- attributes: record
|  |  |- agent_rating: integer
|  |  |- store: string (repeated)
|  +- classifications: record
|  |  |- BU_Model: string (repeated)

The query that I am using to create the second table is as below. I would want to query on the BU_Model as a STRING column.

SELECT sentence_id ,a.attributes.store,a.classifications.BU_Model
FROM staging_table , unnest(sentences) a

Expected Output should look like:

Staging table:

41783851    regions     Apparel
            district    Footwear
12864656    regions
            district

Final Target Table:

41783851    regions     Apparel
41783851    regions     Footwear            
41783851    district    Apparel
41783851    district    Footwear
12864656    regions
12864656    district

I tried the below query and it seems to work as expected, but this means that i would have to unnest every expected repeated field. My table in Bigquery has 50+ columns which are repeated. Is there a easier way around this ?

SELECT
sentence_id,
flattened_stores,
flattened_Model
FROM `staging`  
left join unnest(sentences) a
left join unnest(a.attributes.store) as flattened_stores
left join unnest(a.classifications.BU_Model) as flattened_Model

Mikhail Berlyant Mikhail Berlyant · Accepted Answer · 2018-10-11T20:20:56

Assuming you want still three columns in your output - with arrays being flattened into string

SELECT sentence_id , 
  ARRAY_TO_STRING(a.attributes.store, ',') store,
  ARRAY_TO_STRING(a.classifications.BU_Model, ',') BU_Model
FROM staging_table , unnest(sentences) a

UPDATE to address recent changes in question

In BigQuery Standard SQL - use of LEFT JOIN UNNEST() (as you did in your last query) is the most reasonable way to do what you want to get as a result

In BigQuery Legacy SQL - you can use FLATTEN syntax - but it has same drawback of needing to repeat same for all 50+ column

Very simplified example:

#legacySQL
SELECT sentence_id, store, BU_Model
FROM (FLATTEN([project:dataset.stage], BU_Model))

Conclusion: I would go with LEFT JOIN UNNEST() approach

Querying on Bigquery repeated fields

1 Answers