1
votes

When exporting Google Analytics data to Google BigQuery you can setup a realtime table that is populated with Google Analytics data in real time. However, this table will contain duplicates due to the eventual consistent nature of distributed computing.

To overcome this Google has provided a view where the duplicated are filtered out. However, this view is not queryable with Standard SQL.

If I try querying with standard:

Cannot reference a legacy SQL view in a standard SQL query

We have standardized on Standard, and I am hesitant to rewrite all our batch queries to legacy for when we want to use them on realtime data. Is there a way to switch the realtime view to be a standard view?

EDIT:

This is the view definition (which is recreated every day by Google):

SELECT * 
FROM [111111.ga_realtime_sessions_20190625] 
WHERE exportKey IN (SELECT exportKey 
                    FROM 
                        (SELECT 
                             exportKey, exportTimeUsec, 
                             MAX(exportTimeUsec) OVER (PARTITION BY visitKey) AS maxexportTimeUsec  
                         FROM 
                             [111111.ga_realtime_sessions_20190625]) 
                    WHERE exportTimeUsec >= maxexportTimeUsec );
1
What is the query that defines the view? - Elliott Brossard
SELECT * FROM [111111.ga_realtime_sessions_20190625] where exportKey in ( SELECT exportKey FROM ( SELECT exportKey, exportTimeUsec, MAX(exportTimeUsec) OVER (PARTITION BY visitKey) AS maxexportTimeUsec FROM [111111.ga_realtime_sessions_20190625] ) WHERE exportTimeUsec >= maxexportTimeUsec ); - Øyvind Holmstad
Can you please add this to the question instead of a comment? It will be easier for people to read that way. - Elliott Brossard

1 Answers

2
votes

You can create a logical view like this using standard SQL:

CREATE VIEW dataset.realtime_view_20190625 AS
SELECT
  visitKey,
  ARRAY_AGG(
    (SELECT AS STRUCT t.* EXCEPT (visitKey))
    ORDER BY exportTimeUsec DESC LIMIT 1)[OFFSET(0)].*
FROM dataset.ga_realtime_sessions_20190625 AS t
GROUP BY visitKey

This selects the most recent row for each visitKey. If you want to generalize this across days, you can do something like this:

CREATE VIEW dataset.realtime_view AS
SELECT
  CONCAT('20', _TABLE_SUFFIX) AS date,
  visitKey,
  ARRAY_AGG(
    (SELECT AS STRUCT t.* EXCEPT (visitKey))
    ORDER BY exportTimeUsec DESC LIMIT 1)[OFFSET(0)].*
FROM `dataset.ga_realtime_sessions_20*` AS t
GROUP BY date, visitKey