I am trying to replicate Firebase Cohorts using BigQuery. I tried the query from this post: Firebase exported to BigQuery: retention cohorts query, but the results I get don't make much sense.
I manage to get the users for period_lag 0 similar to what I can see in Firebase, however, the rest of the numbers don't look right:
There is one of the period_lag missing (only see 0,1 and 3 -> no 2) and the user counts for each lag period don't look right either! I would expect to see something like that:
I'm pretty sure that the issue is in how I replaced the parameters in the original query with those from Firebase. Here are the bits that I have updated in the original query:
#standardSQL
WITH activities AS (
SELECT answers.user_dim.app_info.app_instance_id AS id,
FORMAT_DATE('%Y-%m', DATE(TIMESTAMP_MICROS(answers.user_dim.first_open_timestamp_micros))) AS period
FROM `dataset.app_events_*` AS answers
JOIN `dataset.app_events_*` AS questions
ON questions.user_dim.app_info.app_instance_id = answers.user_dim.app_info.app_instance_id
-- WHERE CONCAT('|', questions.tags, '|') LIKE '%|google-bigquery|%'
(...)
WHERE cohorts_size.cohort >= FORMAT_DATE('%Y-%m', DATE('2017-11-01'))
ORDER BY cohort, period_lag, period_label
So I'm using user_dim.first_open_timestamp_micros instead of create_date and user_dim.app_info.app_instance_id instead of id and parent_id. Any idea what I'm doing wrong?

