0
votes

Against a BigQuery table, I'm trying to run a SQL statement calling a UDF. This statement is executed within a Python script and the call is made via the BigQuery API.

When I execute a simple SQL statement without a UDF, it works fine. However, I keep getting the same error when I try to use a UDF script (stored either locally or in a GCS bucket). This what I get on my local Terminal (I run the script via Python Launcher):

Traceback (most recent call last): File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/googleapiclient/http.py", line 840, in execute raise HttpError(resp, content, uri=self.uri) googleapiclient.errors.HttpError: https://www.googleapis.com/bigquery/v2/projects/[projectId]/queries?alt=json returned "Required parameter is missing">

And this is my Python script:

credentials = SignedJwtAssertionCredentials(
SERVICE_ACCOUNT_EMAIL,
key,
scope='https://www.googleapis.com/auth/bigquery')

aservice = build('bigquery','v2',credentials=credentials)
query_requestb = aservice.jobs()

query_data = {
    'configuration': {
        'query': {
            'userDefinedFunctionResources': [
                {
                   'resourceUri': 'gs://[bucketName]/[fileName].js'
                }
            ],
            'query': sql
        }
    },
    'timeoutMs': 100000
}

query_response = query_requestb.query(projectId=PROJECT_NUMBER,body=query_data).execute(num_retries=0)

Any idea what 'parameter is missing' or how I can get this to run?

2

2 Answers

4
votes

Instead of specifying userDefinedFunctionResources, use CREATE TEMP FUNCTION in the body of your 'query' with the library referenced as part of the OPTIONS clause. You will need to use standard SQL for this, and you can also refer to the documentation on user-defined functions. Your query would look something like this:

#standardSQL
CREATE TEMP FUNCTION MyJsFunction(x FLOAT64) RETURNS FLOAT64 LANGUAGE js AS """
  return my_js_function(x);
"""
OPTIONS (library='gs://[bucketName]/[fileName].js');

SELECT MyJsFunction(x)
FROM MyTable;
0
votes

The query I wanted to run was to categorise traffic and sales by marketing channel which I usually use a UDF for. This is the query I ran using standard SQL. This query is stored in a file which I read and store in the variable sql:

CREATE TEMPORARY FUNCTION
  mktchannels(source STRING,
    medium STRING,
    campaign STRING)
  RETURNS STRING
  LANGUAGE js AS """
return channelGrouping(source,medium,campaign) // where channelGrouping is the function in my channelgrouping.js file which contains the attribution rules
  """ OPTIONS ( library=["gs://[bucket]/[path]/regex.js",
    "gs://[bucket]/[path]/channelgrouping.js"] );
WITH
  traffic AS ( // select fields from the BigQuery table
  SELECT
    device.deviceCategory AS device,
    trafficSource.source AS source,
    trafficSource.medium AS medium,
    trafficSource.campaign AS campaign,
    SUM(totals.visits) AS sessions,
    SUM(totals.transactionRevenue)/1e6 as revenue,
    SUM(totals.transactions) as transactions
  FROM
    `[datasetId].[table]`
  GROUP BY
    device,
    source,
    medium,
    campaign)
SELECT
  mktchannels(source,
    medium,
    campaign) AS channel, // call the temp function set above
  device,
  SUM(sessions) AS sessions,
  SUM(transactions) as transactions,
  ROUND(SUM(revenue),2) as revenue
FROM
  traffic
GROUP BY
  device,
  channel
ORDER BY
  channel,
  device;

And then in the Python script:

fd = file('myquery.sql', 'r')
sql = fd.read()
fd.close()

query_data = {
    'query': sql,
    'maximumBillingTier': 10,
    'useLegacySql': False,
    'timeoutMs': 300000
}

Hope this helps anyone in the future!