I am working on a large BigQuery data set that is auto populated from Google Analytics. For this work, I am trying to calculate an exit rate using the data. Additionally, for reference, I am looking closely at the BigQuery export schema that is provided at the following link which describes the BigQuery export schema in sufficient detail.
As noted in this post from Google regarding the exit rate, exit rate can be defined as "For all pageviews to the page, the exit rate is the percentage that were the last in the session." To calculate this, I surmise that I need, for every unique visit, to check each of the pages in the hits.page.pagePath column and, if the the regular expression which finds an error is preceded by another URL that does not indicate an error, then an exit after the error path can be counted as an exit.
This looks to be a very clear case of path analysis & I am not yet certain if BigQuery can handle it easily or efficiently. In general, the URL that I am interested in looking at which contains the exit uses:
REGEXP_MATCH(hits.page.pagePath, r'/[^/]+error\.aspx')
For example, I started out using this in the following context as a trial:
SELECT hits.page.pagePath AS Page_Path
FROM [XXXXXXXX.ga_sessions_20140711]
WHERE REGEXP_MATCH(hits.page.pagePath, r'/[^/]+error\.aspx') OR REGEXP_MATCH(hits.page.pagePath, r'/[^/]+genericerror\.aspx')
Any suggestions or examples that one can point me to which highlight how someone has used BigQuery successfully to calculate exit rate would be greatly appreciated.
FURTHER DETAILS ADDED (July 14, 2014):
The data set in question here is XXXXXXXX. Here are some further details that were added. The overall query that will be executed will create a table with the following output:
Date Page (FROM CASE - there are multiple cases each day, so each day will have 7 or 8 rows each for the various cases), metric 1, metric 2 (bounce), metric 3 (exit)
The query has the following overall specifications:
Description: Pageviews and unique pageviews for ^/XXX/[^/]+error.aspx OR ^/XXX/[^/]+genericerror.aspx
Metrics: Pageviews, Unique Pageviews, Sessions, Users, Bounce rate (bounces), Exit rate (exits)
Case statement value (Page dimension value): Homepage 2.0, Homepage 1.0, Inbound Search 2.0, Inbound Search 1.0, Outbound Search 2.0, Outbound Search 1.0, Review Itinerary 2.0, Review Itinerary 1.0, Traveler info 2.0, Traveler info 1.0, Seat selector 2.0, Seat selector 1.0, Payment info 2.0, Payment info 1.0, digital 2.0 other, digital 1.0 other
CASE STATEMENT:
Case when previous page = "^/XXX/[^/]+/default\.aspx" and landing page="^/XXX/[^/]+/default\.aspx" then "Homepage 2.0" when previous page ="^/web/[^/]+/default\.aspx" and landing page = "^/web/[^/]+/default\.aspx" then "Homepage 1.0" when previous page="^XXX/[^/]+/apps/booking?flight/(searchresult1|search(rt|ow|md))\.aspx" and landing page="^/XXX/[^/]+/default\.aspx" then "Inbound Search 2.0" when previous page="^web/[^/]+/apps/booking?flight/(searchresult1|search(rt|ow|md))\.aspx" and landing page="^/web/[^/]+/default\.aspx" then "Inbound Search 1.0" when previous page="^/XXX/[^/]+/apps/booking/flight/searchResult2\.aspx" and landing page="^/XXX/[^/]+/default\.aspx" then "Outbound Search 2.0" when previous page="^/web/[^/]+/apps/booking/flight/searchResult2\.aspx" and landing page="^/web/[^/]+/default\.aspx" then "Outbound Search 1.0" when previous page="^/XXX/[^/]+/apps/booking/flight/reviewRevenue\.aspx" and landing page="^/XXX/[^/]+/default\.aspx" then "Review Itinerary 2.0" when previous page="^/web/[^/]+/apps/booking/flight/reviewRevenue\.aspx" and landing page="^/web/[^/]+/default\.aspx" then "Review Itinerary 1.0" when previous page="^/XXX/[^/]+/apps/booking/flight/traveler\.aspx" and landing page="^/XXX/[^/]+/default\.aspx" then "Traveler info 2.0" when previous page="^/web/[^/]+/apps/booking/flight/traveler\.aspx" and landing page="^/web/[^/]+/default\.aspx" then "Traveler info 1.0" when previous page="^/XXX/[^/]+/apps/booking/flight/seatSelector\.aspx" and landing page="^/XXX/[^/]+/default\.aspx" then "Seat selector 2.0" when previous page="^/web/[^/]+/apps/booking/flight/seatSelector\.aspx" and landing page="^/web/[^/]+/default\.aspx" then "Seat selector 1.0" when previous page="^/XXX/[^/]+/apps/booking/flight/billingRevenue\.aspx" and landing page="^/XXX/[^/]+/default\.aspx" then "Payment info 2.0" when previous page="^/web/[^/]+/apps/booking/flight/billingRevenue\.aspx" and landing page="^/web/[^/]+/default\.aspx" then "Payment info 1.0" when landing page ="^/XXX/[^/]+/default\.aspx" then "digital 2.0 other" else "digital 1.0 other" end as Page
Dimensions: Date, Page, Device Category, Browser, Custom Variable (Value 04)
Filters: REGEX: page=^/XXX/[^/]+/error.aspx or ^/XXX/[^/]+/genericerror.aspx
Group by / Order by: Date, Page
The final query here must be a single query executed Ad Hoc for a specified number of days which produces the simple table noted above (which sums for the various values). The exit rate looks to be a manual calculation. To start, I used the following table to generate the URL's of the various hits:
SELECT date, CONCAT(fullVisitorId, STRING(visitId)) AS unique_visit_id, visitId, visitNumber, fullVisitorId, totals.pageviews, totals.bounces,
hits.page.pagePath, hits.page.pageTitle, device.deviceCategory, device.browser, device.browserVersion, hits.customVariables.index,
hits.customVariables.customVarName, hits.customVariables.customVarValue, hits.time
FROM (FLATTEN([XXXXXXXX.ga_sessions_20140711], hits.time))
WHERE hits.customVariables.index = 4
ORDER BY unique_visit_id DESC, hits.time ASC
LIMIT 1000;
I am having issues now using the lag function to look at the previous hits.page.pagePath that was provided to do the case, and then the calculation of the exit rate.