Let's say I have an XML data which gets saved in Bigquery table, so big query structure could be some normal atomic column then repeated record column, inside that record column some atomic column and new repeated record column, and this is for many columns.
Now I want to know the best way to unnest, mainly when we need data from each column and apply aggregation lets say(sum ) on the repeated column.
For example, if I write a query as Select name, sum(unit.unitprice) from test, unnest(product1) p, Unnest(p.unit) unit - here unit is repeated record column in repeated product record column.
It works but as I have another column for another product (for example), so if I write
Select name, sum(unit1.unitprice), sum(unit2.tax) from test, unnest(product1) p1, Unnest(p1.unit) unit1, unnest(product2) p2, Unnest(p2.unit) unit2
Here problem will start as all three cartesian product would give more row and sum would be wrong.
This is just an example, the table has more such column on which I want to apply sum and all.
JSON File data which can use similar structure is:-
{"SKU":"123456","time":"2018-08-27T16:42:04.000","fan":[{"sequence":1,"unit":[{"sell":"126.89","cost":"126.89"},{"sell":"126.89","cost":"126.89"}],"product":[{"fee":"40","reason":"testing"},{"fee":"400","reason":"testing1"}]}], "AC":[{"sequence":1,"unit":[{"sell":"1500","cost":"1500"}]},{"sequence":2,"unit":[{"sell":"1500","cost":"1500"},{"sell":"200","cost":"250"}]}]}
Queries:- This query gives right sum
select any_value(sku), sum(unit.cost) from nonpii_air_ticketed.test
, unnest(fan) f, unnest(f.unit) unit
As we add another component (product) in unnest unit sum changes and gives wrong value and the same applies if we have more columns like it.
select any_value(sku), sum(unit.cost), sum(fee.fee) from nonpii_air_ticketed.test
, unnest(fan) f, unnest(f.unit) unit, unnest(f.product) fee