We just started migrating our queries from Legacy to Standard SQL so we are learning on how to process nested data and arrays now.
Basically what we want to do is to retrieve from ga_sessions table the following data:
visitor id, session id, array of skus
visitor 1, session 1, [sku_0, sku_1, (...), sku_n]
visitor 1, session 2, [skus]
To do so we ran this simple query:
WITH
customers_data AS(
SELECT
fullvisitorid fv,
visitid v,
ARRAY_AGG((
SELECT
prods.productsku
FROM
UNNEST(hits.product) prods)) sku
FROM
`dataset_id.ga_sessions_*`,
UNNEST(hits) hits
WHERE
1 = 1
AND _table_suffix BETWEEN FORMAT_DATE("%Y%m%d", DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY))
AND FORMAT_DATE("%Y%m%d", DATE_SUB(CURRENT_DATE(), INTERVAL 0 DAY))
--and (select count(productsku) from unnest(hits.product) where productsku is not null) = 1
GROUP BY
fv,
v
LIMIT
100 )
SELECT
*
FROM
customers_data
But we get this error:
Error: Scalar subquery produced more than one element
The data that comes from the hits field looks something like this:
So when we addded back the where clause:
and (select count(productsku) from unnest(hits.product) where productsku is not null) = 1
It does not give an error but the results have duplicated skus and we also lost the skus inside the bigger arrays.
Is there some mistake in our query preventing the arrays of being unnested?
