We are currently migrating from google analytics (Mobile Data) to firebase.
The data in firebase is being streamed to BigData, and we use the following query to work out the session durations.
When I run a report on those sessions, the average session duration reported via firebase is about 4 times less than the average session duration reported by Google analytics.
I have the same problem across various devices (android, ios), and send the same custom events, at the same time, to both google analytics and firebase
using session_start event or unnest user_properties and use directly ga_session_number instead give the same results.
with timeline as
(
select
geo.city as geo_city
,device.operating_system as device_category
, user_pseudo_id
, event_name
, event_timestamp
, lag(event_timestamp, 1) over (partition by user_pseudo_id order by event_timestamp) as prev_event_timestamp
from
`projid.analytics_xxx.events_*`
)
, session_timeline as
(
select
event_name
, geo_city
, device_category
, user_pseudo_id
, event_timestamp
, prev_event_timestamp
, case
when
event_timestamp - prev_event_timestamp >= (30*60*1000*1000)
or
prev_event_timestamp is null
then 1
else 0
end as is_new_session_flag
from
timeline
)
, marked_sessions as
(
select
device_category
, geo_city
, user_pseudo_id
, event_timestamp
, prev_event_timestamp
, sum(is_new_session_flag) over (partition by user_pseudo_id order by event_timestamp) AS user_session_id
, min(prev_event_timestamp) over (partition by user_pseudo_id order by event_timestamp) AS first_prev_event_timestamp
from session_timeline
)
, measured_sessions as
(
select
device_category
, geo_city
, user_pseudo_id
, user_session_id
, FORMAT_TIMESTAMP("%F", TIMESTAMP_MICROS(first_prev_event_timestamp)) as first_time_date
, FORMAT_TIMESTAMP("%F", TIMESTAMP_MICROS(min(event_timestamp))) as date
, round((max(event_timestamp) - min(event_timestamp))/ (1000 * 1000), 2) as session_duration_seconds
, round((max(event_timestamp) - min(event_timestamp))/ (1000 * 1000 * 60), 2) as session_duration_minutes
, round((max(event_timestamp) - min(event_timestamp))/ (1000 * 1000 * 60 * 60), 2) as session_duration_hours
from
marked_sessions
group by
user_pseudo_id
, user_session_id
, device_category
, geo_city
, first_prev_event_timestamp
having
-- let's count only sessions longer than 10 seconds
session_duration_seconds >= 10
)
select
*
from
measured_sessions
order by date desc, user_pseudo_id desc, user_session_id desc
not only the average session length differs significantly, but even the session count is way off the mark.