1
votes

Let's say I have an XML data which gets saved in Bigquery table, so big query structure could be some normal atomic column then repeated record column, inside that record column some atomic column and new repeated record column, and this is for many columns.

Now I want to know the best way to unnest, mainly when we need data from each column and apply aggregation lets say(sum ) on the repeated column.

For example, if I write a query as Select name, sum(unit.unitprice) from test, unnest(product1) p, Unnest(p.unit) unit - here unit is repeated record column in repeated product record column.

It works but as I have another column for another product (for example), so if I write

Select name, sum(unit1.unitprice), sum(unit2.tax) from test, unnest(product1) p1, Unnest(p1.unit) unit1, unnest(product2) p2, Unnest(p2.unit) unit2

Here problem will start as all three cartesian product would give more row and sum would be wrong.

This is just an example, the table has more such column on which I want to apply sum and all.

JSON File data which can use similar structure is:-

{"SKU":"123456","time":"2018-08-27T16:42:04.000","fan":[{"sequence":1,"unit":[{"sell":"126.89","cost":"126.89"},{"sell":"126.89","cost":"126.89"}],"product":[{"fee":"40","reason":"testing"},{"fee":"400","reason":"testing1"}]}], "AC":[{"sequence":1,"unit":[{"sell":"1500","cost":"1500"}]},{"sequence":2,"unit":[{"sell":"1500","cost":"1500"},{"sell":"200","cost":"250"}]}]}

Queries:- This query gives right sum

select any_value(sku), sum(unit.cost) from nonpii_air_ticketed.test, unnest(fan) f, unnest(f.unit) unit

As we add another component (product) in unnest unit sum changes and gives wrong value and the same applies if we have more columns like it.

select any_value(sku), sum(unit.cost), sum(fee.fee) from nonpii_air_ticketed.test, unnest(fan) f, unnest(f.unit) unit, unnest(f.product) fee

Table schema with value

1
you are missing to present some simplified example of your data and expected output - please add and you will get the answer for sure. otherwise it can be a challenge for us to help you. You can read How to Ask and show a Minimal, Complete, and Verifiable exampleMikhail Berlyant
Apologies for the incomplete question, I have added some information, hope this helpsPankaj Bajpai

1 Answers

1
votes

Below is for BigQuery Standard SQL

#standardSQL
SELECT 
  ANY_VALUE(sku), 
  SUM((SELECT SUM(cost) FROM f.unit)), 
  SUM((SELECT SUM(fee) FROM f.product)) 
FROM nonpii_air_ticketed.test, 
UNNEST(fan) f