0
votes

I am trying to use bigquery to query data from google cloud storage.

enter image description here

Those are the data for my real time DB in firebase. It consists of json file. How can I query through and see data inside each file?

I am actually reading through this but I don't understand how to connect.

https://cloud.google.com/bigquery/external-data-cloud-storage

Update

enter image description here

enter image description here

If I point to particular file (like firebase.json), it is okay. But i need to update my data.

enter image description here

1
What is unclear? What did you try?Elliott Brossard
I don't see document/tutorial how to connect to firebase realtime DB from bigquery (also to query files from Google Cloud Storage which firebase realtime DB backup automatically).Khant Thu Linn
You can define a temporary or permanent external table over a JSON file stored on GCS, as described in the documentation that you linked.Elliott Brossard
How can I get JSON from .GZ file? It is because firebase automatically backup in that format. If possible, I need to tap data from firebase real time db directly.Khant Thu Linn

1 Answers

2
votes

Using UI by 2018-10-29

If you want to do it from the UI. Go to bigquery, at the left click on your

project -> dataset -> at the right click on cleater Table

Fill the rest of the information

  • Create table from: Google Cloud Storage
  • Select from GCS bucket: path to your database location on GCS
  • File format: JSON
  • Under the schema section click auto detect schema

That should do the work using the new UI by 2018-10-29

Using CLI

You can do it from google console as well using bigquery CLI like

bq mk --external_table_definition=gs://yourbucket/path/yourdb --destination_format NEWLINE_DELIMITED_JSON --autodetect yourdataset.yourtablename

Last arguments is your dataset name dot your table name.

You don't need to specify a file, you can point to a folder. About compression, from BigQuery documentation

If you use gzip compression BigQuery cannot read the data in parallel. Loading compressed JSON data into BigQuery is slower than loading uncompressed data.

So it is supported, but not recommended from a performance point of view due the limitations of the gz format, Anyway if should have several small files, it shouldn't matter, I think it is more related to big files gziped.

EDIT: If you need to point multiple files, you can use wildcard like --external_table_definition=gs://yourbucket/path/yourdb/*