I want to create a job which runs at say 2 am every morning. This job must create a BigQuery table by reading my files from Cloud Storage bucket. How can I achieve this?
0
votes
google-cloud-platformgoogle-bigquerygoogle-cloud-functionsgoogle-cloud-storagegoogle-cloud-scheduler
What is the file format? How do you generate the table name? How many files do you have? Do you want to append to existing tables or truncate them?
– guillaume blaquiere
The files are firestore exports (.export_metadata). These are my firestore sub collection exports. I have 4 sub collections. Each of these sub collections have 419 documents each. I want to truncate existing table and reload them.
– nsk
1 Answers
2
votes
You can directly import your firestore backup into BigQuery. Set up a load job with the sourceFormat equal to DATASTORE_BACKUP
(yes even for firestore) and the writeDisposition to WRITE_TRUNCATE
You can wrap this into a Cloud Function. You can use directly the API or the client libraries. If you need code sample, give me your language, I will see what I can do for you.
EDIT
You need to import these dependencies in your package.json
"@google-cloud/bigquery": "^4.7.0",
"@google-cloud/storage": "^5.0.1",
Then, here the function with statics values. You can build something more dynamic if you want (by reading the function param for example).
const {Storage} = require('@google-cloud/storage');
const {BigQuery} = require('@google-cloud/bigquery');
const bigquery = new BigQuery();
const storage = new Storage();
//
const bucketName = "my_bucket" //to change
const fileExport = "path/to/my_export.export_metadata" //to change
const datasetId = "data" //to change
const tableId = "dsexport" //to change
exports.loadDSExport = async (req, res) => {
// Configure the load job. For full list of options, see:
// https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#JobConfigurationLoad
const metadata = {
sourceFormat: 'DATASTORE_BACKUP',
autodetect: true,
location: 'EU', // Set your correct region
writeDisposition: "WRITE_TRUNCATE",
};
// Load data from a Google Cloud Storage file into the table
const [job] = await bigquery
.dataset(datasetId)
.table(tableId)
.load(storage.bucket(bucketName).file(fileExport), metadata);
// load() waits for the job to finish
// Can take time, increase function timeout if needed
// Check the job's status for errors
const errors = job.status.errors;
if (errors && errors.length > 0) {
//Handle error and return code here
throw errors;
}
console.log(`Job ${job.id} completed.`);
res.send(`Job ${job.id} completed.`);
};
And then, deploy your function like this (here in private mode)
gcloud beta functions deploy --runtime nodejs10 --trigger-http --entry-point loadDSExport --region europe-west1 loadDSExport