0
votes

I am using Bigquery standard sql to count google analytics data, but when i apply unnest to break repeated record field in table, other columns like hit count data become duplicate and display more value than the actual

SELECT
    date,
    trafficSource.source as source,
    trafficSource.medium as medium,
    SUM(totals.hits) AS total_hit,
    MAX(hits.transaction.transactionid) as transaction
FROM
    `test.test.session_streaming_*`,unnest(hits) hits
WHERE
    _table_suffix BETWEEN '20180401'
    AND '20180501'   
GROUP BY
    date,
    trafficSource.source,
    trafficSource.medium

Could anyone help me to tell how can we remove duplicate data in this query

1

1 Answers

1
votes

It looks like you want to compute the max transaction ID within hits for each row, and then take the max across all rows. This should work:

SELECT
  date,
  trafficSource.source as source,
  trafficSource.medium as medium,
  SUM(totals.hits) AS total_hit,
  MAX((SELECT MAX(transaction.transactionid) FROM UNNEST(hits))) as transaction
FROM
  `test.test.session_streaming_*`
WHERE
  _table_suffix BETWEEN '20180401'
    AND '20180501'
GROUP BY
  date,
  trafficSource.source,
  trafficSource.medium