0
votes

We are currently migrating from google analytics (Mobile Data) to firebase.

The data in firebase is being streamed to BigData, and we use the following query to work out the session durations.

When I run a report on those sessions, the average session duration reported via firebase is about 4 times less than the average session duration reported by Google analytics.

I have the same problem across various devices (android, ios), and send the same custom events, at the same time, to both google analytics and firebase

using session_start event or unnest user_properties and use directly ga_session_number instead give the same results.

with timeline as
(
  select 
    geo.city as geo_city
    ,device.operating_system as device_category
    , user_pseudo_id
    , event_name
    , event_timestamp
    , lag(event_timestamp, 1) over (partition by user_pseudo_id order by event_timestamp) as prev_event_timestamp
  from 
    `projid.analytics_xxx.events_*`
) 
, session_timeline as 
(
  select
    event_name
    , geo_city
    , device_category
    , user_pseudo_id
    , event_timestamp
    , prev_event_timestamp
    , case 
        when 
           event_timestamp - prev_event_timestamp >= (30*60*1000*1000)
             or
           prev_event_timestamp is null 
          then 1
          else 0 
      end as is_new_session_flag 
  from 
    timeline
)
, marked_sessions as
(
  select 
    device_category
    , geo_city
    , user_pseudo_id
    , event_timestamp
    , prev_event_timestamp
    , sum(is_new_session_flag) over (partition by user_pseudo_id order by event_timestamp) AS user_session_id
    , min(prev_event_timestamp) over (partition by user_pseudo_id order by event_timestamp) AS first_prev_event_timestamp

  from session_timeline
)
, measured_sessions as
(
  select
    device_category
    , geo_city
    , user_pseudo_id
    , user_session_id
    , FORMAT_TIMESTAMP("%F", TIMESTAMP_MICROS(first_prev_event_timestamp)) as first_time_date
    , FORMAT_TIMESTAMP("%F", TIMESTAMP_MICROS(min(event_timestamp))) as date
    , round((max(event_timestamp) - min(event_timestamp))/ (1000 * 1000), 2) as session_duration_seconds
    , round((max(event_timestamp) - min(event_timestamp))/ (1000 * 1000 * 60), 2) as session_duration_minutes
    , round((max(event_timestamp) - min(event_timestamp))/ (1000 * 1000 * 60 * 60), 2) as session_duration_hours 
  from 
    marked_sessions
  group by
    user_pseudo_id
    , user_session_id
    , device_category
    , geo_city
    , first_prev_event_timestamp
  having 
    -- let's count only sessions longer than 10 seconds
    session_duration_seconds >= 10
)
select 
  *
from 
  measured_sessions
 order by date desc, user_pseudo_id desc, user_session_id desc

not only the average session length differs significantly, but even the session count is way off the mark.

1

1 Answers

0
votes

Are your results sampled? Usually reporting discrepancies in Google Analytics are due to sampling. Try to import data grouped by fewer dimensions or use a shorter period of time.