2
votes

We have migrated all of our old Firebase BigQuery events tables to the new schema using the provided script. One thing we noticed was that the size of the daily tables increased dramatically.

For example, the data from 4/1/18 in the old schema was 3.5MM rows and 8.7 Gig. Once migrated, the new table from the same date is 32.3MM rows and 27 Gig. This is nearly 10 times larger in terms of number of rows and over 3X larger by space size.

Can someone tell me why the same data is so much larger in the new schema?

The result is that we are getting charged significantly more in BigQuery query costs when reading the tables from the new schema versus the old schema.

1
Good question Mark. The only think I can think of is that the new format is a lot more normalized, and we'd be duplicating properties between the events. I've asked around to see if anyone here knows top of mind. - Frank van Puffelen

1 Answers

2
votes

firebaser here

While increasing the size of the exported data definitely wasn't a goal, it is an expected side-effect of the new schema.

In the old storage format the events were stored in bundles. While I don't exactly know how the events are bundled, it was definitely always a bunch of events with their own unique and with shared properties. This meant that you frequently had to unnest the data in your query or cross join the tables with themselves, to get to the raw data, and then combine and group it again to fit your requirements.

In the new storage format, each event is stored separately. This definitely increases the storage size, since properties that were shared between events in a bundle, now are duplicated for each event. But the queries you write on the new format should be easier to read and can process the data faster, since they don't have to unnest it first.

So the larger storage size should come with a slightly faster processing speed. But I can totally imagine the sticker shock when you see the difference, and realize the improved speed doesn't always make up for that. I apologize if that is the case, and have been assured that don't have any other big schema changes planned from here on.