0
votes

My bq table schema:

enter image description here

Continuing this post: bigquery pivoting with nested field I'm trying to flatten this table. I would like to unnest the timeseries.data fields, i.e. the final number of rows should be equal to the total length of timeseries.data arrays. I would also like to add annotation.properties.key with certain value as additional columns, and annotation.properties.value as its value. So in this case, it would be the "margin" column. However the following query gives me error: "Unrecognized name: data". But after the last FROM, I did already: unnest(timeseries.data) as data.

flow_timestamp, channel_name, number_of_digits, timestamp, value, margin
2019-10-31 15:31:15.079674 UTC, channel_1, 4, 2018-02-28T02:00:00, 50, 0.01

query:

SELECT 
  flow_timestamp, timeseries.channel_name, 

  ( SELECT MAX(IF(channel_properties.key = 'number_of_digits', channel_properties.value, NULL)) 
    FROM UNNEST(timeseries.channel_properties) AS channel_properties
  ),
  data.timestamp ,data.value

,(with subq as (select * from unnest(data.annotation))
select max(if (properties.key = 'margin', properties.value, null))
from (
select * from unnest(subq.properties)
) as properties
) as margin

FROM my_table
left join unnest(timeseries.data) as data

WHERE DATE(flow_timestamp) between "2019-10-28" and "2019-11-02" 
order by flow_timestamp
1

1 Answers

0
votes

Try below

#standardSQL
SELECT 
  flow_timestamp, 
  timeseries.channel_name, 
  ( SELECT MAX(IF(channel_properties.key = 'number_of_digits', channel_properties.value, NULL)) 
    FROM UNNEST(timeseries.channel_properties) AS channel_properties
  ) AS number_of_digits, 
  item.timestamp, 
  item.value, 
  ( SELECT MAX(IF(prop.key = 'margin', prop.value, NULL)) 
    FROM UNNEST(item.annotation) AS annot, UNNEST(annot.properties) prop
  ) AS margin  
FROM my_table 
LEFT JOIN UNNEST(timeseries.data) item
WHERE DATE(flow_timestamp) BETWEEN '2019-10-28' AND '2019-11-02' 
ORDER BY flow_timestamp

Below is a little less verbose version of the same solution, but I usually prefer above as it simpler to maintain

#standardSQL
SELECT 
  flow_timestamp, 
  timeseries.channel_name, 
  ( SELECT MAX(IF(key = 'number_of_digits', value, NULL)) 
    FROM UNNEST(timeseries.channel_properties) AS channel_properties
  ) AS number_of_digits, 
  timestamp, 
  value, 
  ( SELECT MAX(IF(key = 'margin', value, NULL)) 
    FROM UNNEST(annotation), UNNEST(properties) 
  ) AS margin  
FROM my_table 
LEFT JOIN UNNEST(timeseries.data)   
WHERE DATE(flow_timestamp) BETWEEN '2019-10-28' AND '2019-11-02' 
ORDER BY flow_timestamp