0
votes

I want to create a job which runs at say 2 am every morning. This job must create a BigQuery table by reading my files from Cloud Storage bucket. How can I achieve this?

1
What is the file format? How do you generate the table name? How many files do you have? Do you want to append to existing tables or truncate them?guillaume blaquiere
The files are firestore exports (.export_metadata). These are my firestore sub collection exports. I have 4 sub collections. Each of these sub collections have 419 documents each. I want to truncate existing table and reload them.nsk

1 Answers

2
votes

You can directly import your firestore backup into BigQuery. Set up a load job with the sourceFormat equal to DATASTORE_BACKUP (yes even for firestore) and the writeDisposition to WRITE_TRUNCATE

You can wrap this into a Cloud Function. You can use directly the API or the client libraries. If you need code sample, give me your language, I will see what I can do for you.

EDIT

You need to import these dependencies in your package.json

    "@google-cloud/bigquery": "^4.7.0",
    "@google-cloud/storage": "^5.0.1",

Then, here the function with statics values. You can build something more dynamic if you want (by reading the function param for example).

const {Storage} = require('@google-cloud/storage');
const {BigQuery} = require('@google-cloud/bigquery');

const bigquery = new BigQuery();
const storage = new Storage();
//
const bucketName = "my_bucket" //to change
const fileExport = "path/to/my_export.export_metadata" //to change
const datasetId = "data" //to change
const tableId = "dsexport" //to change
exports.loadDSExport = async (req, res) => {

    // Configure the load job. For full list of options, see:
    // https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#JobConfigurationLoad
    const metadata = {
        sourceFormat: 'DATASTORE_BACKUP',
        autodetect: true,
        location: 'EU', // Set your correct region
        writeDisposition: "WRITE_TRUNCATE",
    };

    // Load data from a Google Cloud Storage file into the table
    const [job] = await bigquery
        .dataset(datasetId)
        .table(tableId)
        .load(storage.bucket(bucketName).file(fileExport), metadata);
    // load() waits for the job to finish
    // Can take time, increase function timeout if needed

    // Check the job's status for errors
    const errors = job.status.errors;
    if (errors && errors.length > 0) {
        //Handle error and return code here
        throw errors;
    }

    console.log(`Job ${job.id} completed.`);
    res.send(`Job ${job.id} completed.`);
};

And then, deploy your function like this (here in private mode)

gcloud beta functions deploy --runtime nodejs10 --trigger-http --entry-point loadDSExport --region europe-west1 loadDSExport