0
votes

I am using BigQueryIO to publish data into BigQuery from a Google Dataflow job.

AFAIK, BigQuery can be used to query data from Google Cloud Storage, Google Drive and Google Sheets.

But when we store data using BigQueryIO, where the data will stored? Is it in Google Cloud Storage?

3

3 Answers

5
votes

Short answer - BigQueryIO Write/Read to/from BigQuery Table

To go a little deeper:
BigQuery stores data in the Capacitor columnar data format, and offers the standard database concepts of tables, partitions, columns, and rows.

It manages the technical aspects of storing your structured data, including compression, encryption, replication, performance tuning, and scaling.

You can read more about BigQuery different components in BigQuery Overview

3
votes

Cloud Storage is a separate service from Big Query. Internally, Big Query manages its own storage.

So, if you save your data to Cloud Storage, and then use the bq command to load a Big Query table from a file in Cloud Storage, there are now 2 copies of the data.

Consequences include:

  • If you delete the Cloud Storage copy, the data will still be in Big Query.
  • Fees include a price for each copy. I think in April 2017 long term storage in BQ is around $0.01/GB, and in cloud storage around $0.01-$0.026/GB depending on storage class.
  • If the same data is in both GCS and BQ, you are paying twice. Whether it is worthwhile to have a backup copy of data is up to you.
1
votes

BigQuery is a managed data warehouse, simply say it's a database.

So your data will be stored in BigQuery, and you can acccess it by using SQL queries.