0
votes

I am importing some data from GCS to BigQuery, using Dataflow pre-defined GCS to BigQuery template. The data is processed with the JavaScript UDF.

I would like to exclude some records from being inserted into the BigQuery. Is there a way to do it with the JavaScript UDF?

1

1 Answers

2
votes

For the records you would like to skip, you can emit undefined from the UDF and those records will no longer be included in the output.

You can check out an example of this functionality here: https://github.com/GoogleCloudPlatform/DataflowTemplates#filtering-records

/**
 * A transform function which only accepts 42 as the answer to life.
 * @param {string} inJson
 * @return {string} outJson
 */
function transform(inJson) {
  var obj = JSON.parse(inJson);
  // only output objects which have an answer to life of 42.
  if (obj.hasOwnProperty('answerToLife') && obj.answerToLife === 42) {
    return JSON.stringify(obj);
  }
}