I am trying to validate data streaming events into BigQuery by cross checking them with Mixpanel. The data in BigQuery, however, is always more for each type of event we are streaming into than Mixpanel. I thought this was a duplication issue, but the times are different for each event within BigQuery. The only issue I can see that might be causing the difference is the streaming insert having a significant lag, making certain events not show up in the table for up to an hour. If anyone can give me insight to this issue I would appreciate it. To clarify:
I am validating the BigQuery data by looking at how many events are streaming in per day.
The difference is somewhat small, for example for a particular day Mixpanel sees 634 events while BigQuery is seeing 703 events.
I have already taken into account the timezone difference, as Mixpanel gives the events in your current time zone and my company stores events in UTC.