I'm exporting data from Firebase to BigQuery but figures shown in Firebase are different to those I extract manually in BigQuery when I COUNT(DISTINCT) users with USER_ENGAGEMENT event.
In Firebase, I defined an "All users" Audience. In the relative Audience dashboard, Activity graph (that shows active users day by day vs previous month) I see that on 1st April I had 388.597 users.
I have a standard live extraction from Firebase to BigQuery that integrates event data and I would like to find the same number here.
This is the query I use in BigQuery
SELECT
PARSE_DATE('%Y%m%d', event_date) AS Date
, COUNT(DISTINCT user_pseudo_id ) AS Active_Users
FROM
`xxxx.analytics_162988291.events_*`
WHERE FORMAT_DATE('%Y-%m', PARSE_DATE('%Y%m%d', event_date)) = FORMAT_DATE('%Y-%m', DATE_ADD(CURRENT_DATE(), INTERVAL -EXTRACT(DAY FROM CURRENT_DATE()) DAY))
and event_name = 'user_engagement'
GROUP BY Date
ORDER BY Date;
For the 1st April, BigQuery returns 391394 distinct users, which is 2797 more than Firebase.
I found here Firebase vs BigQuery Active Users Discrepancies that this might be due to timezone settings. My project is set as GMT+1 while BigQuery uses UTC to name the tables. However in my query I'm already grouping by EVENT_DATE. To be sure, I anyway compared day-by-day the 2 figures for the month of April. here are the figures:
Firebase BigQuery Difference
388 597 391 394 -2 797
378 393 374 023 4 370
372 890 374 601 -1 711
364 520 366 367 -1 847
375 903 374 479 1 424
453 630 452 077 1 553
476 413 476 401 12
398 481 396 227 2 254
378 338 375 531 2 807
368 699 369 699 -1 000
363 063 363 436 -373
369 874 368 441 1 433
460 472 454 647 5 825
481 832 487 214 -5 382
395 781 392 564 3 217
370 940 366 189 4 751
122 434 122 549 -115
342 249 344 086 -1 837
371 934 367 609 4 325
391 817 391 765 52
390 550 391 367 -817
412 276 408 876 3 400
377 219 379 659 -2 440
370 611 366 901 3 710
367 272 365 267 2 005
365 307 369 399 -4 092
438 676 439 990 -1 314
471 515 471 363 152
392 256 395 532 -3 276
369 500 368 416 1 084
Total
11 581 442 11 566 069 15 373
If it was a timezone problem, I would have argued that the total difference should not have been too different than any other day, but the difference is instead more significant.
Also, as I'm using StandardSQL, COUNT(DISTINCT) should be accurate.
At this point, I suspect Firebase is defining "User Activity" as something different, but I fail to understand how.
Help!
EDIT
By reading Firebase -> BigQuery how to get active users for that month, week, day I started to notice how people are using a different grouping field than the one I use (user_pseudo_id), however it doesnt look like my table events_* has other app_info.* fields than
app_info.id STRING
app_info.version STRING
app_info.install_store STRING
app_info.firebase_app_id STRING
app_info.install_source STRING
could this be just a grouping problem?