Firebase offers private backups on Google Cloud Storage. One of the featured use case is "Ingestion into Analytics Products":
Private Backups provides a perfect pipeline into cloud analytics products such as Google’s BigQuery. Cloud Analytics products often prefer to ingest data through Cloud Storage buckets rather than directly from the application.
I have a lot of data in Firebase (more than 1GB when exported to a Cloud Storage bucket) and, as described in Firebase offering, I wanted to put those data in Big Query.
But is it really possible to write a table schema that fits Firebase raw data? Let's take as an example the dinosaur-facts database from Firebase documentation. The JSON looks like this:
{
"dinosaurs" : {
"bruhathkayosaurus" : {
"appeared" : -70000000,
"height" : 25
},
"lambeosaurus" : {
"appeared" : -76000000,
"height" : 2.1
}
},
"scores" : {
"bruhathkayosaurus" : 55,
"lambeosaurus" : 21
}
}
To list all dinosaurs, I suppose the only way would be to use a RECORD field in bigQuery schema. But usually RECORDS in BigQuery correspond to an array in the imported JSON. And there's no array here in Firebase, just an object with dinosaur names as the key names.
So a BigQuery table schema like this doesn't work:
[
{
"name": "dinosaurs",
"type": "RECORD",
"mode": "REQUIRED",
"fields": [
{
"name": "dinosaur",
"type": "RECORD",
"mode": "REPEATED",
"fields": [
{
"name": "appeared",
"type": "INTEGER"
},
{
"name": "height",
"type": "INTEGER"
},
{
"name": "length",
"type": "INTEGER"
},
{
"name": "order",
"type": "STRING"
},
{
"name": "vanished",
"type": "INTEGER"
},
{
"name": "weight",
"type": "INTEGER"
}
]
},
{
"name": "scores",
"type": "RECORD",
"mode": "REPEATED",
"fields": [
{
"name": "dinosaur",
"type": "INTEGER"
}
]
}
]
}
]
Is it possible to write a table schema that fits Firebase raw data? Or should we first prepare the data to make it compatible with BigQuery?