2
votes

I am trying out querying the table using UDF in BigQuery[or through Talend] in my preferred JSON as output. I have gone through the link which explains usage of Inline and external UDFs. But I could not figure a way to execute UDFs from CLI.

Is it possible to execute the external UDF from CLI [bq or gsutil], which I can use it via Talend Data Integerator tool.Can anyone suggest any pointer to this?

1

1 Answers

2
votes

You can run a UDF via the "bq" command line tool by specifying the --udf_resource flag. You can set the flag value to a gs:// URL or to the name of a local file.

For example, you can run the urlDecode UDF from the UDF documentation as follows:

$ cat urldecode.js
// UDF definition
function urlDecode(row, emit) {
  emit({title: decodeHelper(row.title),
        requests: row.num_requests});
}

// Helper function with error handling
function decodeHelper(s) {
  try {
    return decodeURI(s);
  } catch (ex) {
    return s;
  }
}

// UDF registration
bigquery.defineFunction(
  'urlDecode',  // Name used to call the function from SQL

  ['title', 'num_requests'],  // Input column names

  // JSON representation of the output schema
  [{name: 'title', type: 'string'},
   {name: 'requests', type: 'integer'}],

  urlDecode  // The function reference
);

$ cat query.sql
SELECT requests, title
FROM
  urlDecode(
    SELECT
      title, sum(requests) AS num_requests
    FROM
      [fh-bigquery:wikipedia.pagecounts_201504]
    WHERE language = 'fr'
    GROUP EACH BY title
  )
WHERE title LIKE '%รง%'
ORDER BY requests DESC
LIMIT 100

$ bq query --udf_resource=urldecode.js "$(cat query.sql)"