3
votes

I am wondering whether a recursive UDF function in BigQuery is the right solution to what i'm doing. But first, is it possible to run a query from inside the UDF?

I see a similar question here: BigQuery : is it possible to execute another query inside an UDF? but the solution seems to be a workaround that executes straight SQL. In my case, I might have to call the UDF repeatedly/recursively without knowing in advance the number of steps (say 3-7 steps).

It's a simple use-case of building a relationship graph over user-name entries in a table, with X degrees of separation, where X will be supplied by the end-user as an argument. My guess is recursive-style UDF would work well, but is it possible?

****EDIT: More detail on the use-case:**
Consider a table with transaction data, which contains the counterparts in each row, along with some other information:

Buyer, Seller
Bob->Alice
Bob->Carol
Bob->John
John-Peter
John-Sam
Bob->Mary

Suppose I want to visualize the relationship of Bob with his counterparts, with 1 degree of separation (i.e. also showing each counterpart's relationships 1 step removed from Bob). I want to use the force graph like this one here: D3 Force-Collapsible Graph

This graph requires a .JSON file with the following structure:

{
    "name": "Bob", "size":5000,
    "children":
        [
            {"name":"Alice","size":3000},
            {"name":"Carol","size":3000},
            {"name":"John","size":3000,
            "children":[
                         {"name":"Peter","size":3000},
                         {"name":"Sam","size":3000}
                        ]},
            {"name":"Mary","size":3000}
        ]
}

so, with 1 degree of separation, Bob has 4 children, and out of those, John has 2 children. This can go deeper with X degrees of separation, ideally with X provided by the user, but practically can also be hard-coded to say level 3 or 5.

2
Are you using legacy or standard SQL (they handle UDFs differently) e.g. cloud.google.com/bigquery/docs/reference/standard-sql/…Graham Polley
Current implementation of BigQuery UDF does not support recursive calls. I suggest you to rather present your use-case along with how you try to solve it and what kind of issue you have so someone will help you .Mikhail Berlyant
The use case is to generate a collapsible tree similar to link where the number of branches away from the center will be supplied by the user. The relations data is in BigQuery in simple UserA->UserB format and the front end is served via Node.JS as shown in the BigQuery documentation. The answer below based on GENERATE_ARRAY could work. Will investigate... ** as to legacy/standard SQL, I'd use whatever works for the task!VS_FF
ideally, you should present example of your input data and desired output, otherwise it will be abstract back and forth based on assumptions - which usually is not productive at allMikhail Berlyant
Sorry for the delay, I updated the main question to include more details on the use-case; hope it's more clear now...VS_FF

2 Answers

2
votes

Try below
It is generic enough and has quite simple pattern to follow if you will need to extend it to more degrees of separation
For the sake of example I introduced logic for size attribute - which is (in below example) literally size of item in terms of number of items in in it (including itself) - so it is essentially count of children + 1
So, enjoy:

#standardSQL
CREATE TEMP FUNCTION size(item STRING) AS (
  (SELECT CAST(IFNULL(1 + (LENGTH(item) - LENGTH(REPLACE(item, 'name', '')))/4, 1) AS STRING))
);
CREATE TEMP FUNCTION dress(parent STRING, children STRING) AS (
  (SELECT CONCAT('{"name":"', parent, '","size":', size(children), IFNULL(CONCAT(',"children":[', children, ']'), ''), '}'))
);
WITH items AS (
  SELECT 'Bob' AS parent, 'Alice' AS child UNION ALL
  SELECT 'Bob' AS parent, 'Carol' AS child UNION ALL
  SELECT 'Bob' AS parent, 'John' AS child UNION ALL
  SELECT 'John' AS parent, 'Peter' AS child UNION ALL
  SELECT 'John' AS parent, 'Sam' AS child UNION ALL
  SELECT 'Peter' AS parent, 'Sam' AS child UNION ALL
  SELECT 'Sam' AS parent, 'Mike' AS child UNION ALL
  SELECT 'Sam' AS parent, 'Nick' AS child UNION ALL
  SELECT 'Bob' AS parent, 'Mary' AS child 
), degree2 AS ( 
  SELECT d1.parent AS parent, d1.child AS child_1, d2.child AS child_2
  FROM items AS d1 LEFT JOIN items AS d2 ON d1.child = d2.parent
), degree3 AS (
  SELECT d1.*, d2.child AS child_3 
  FROM degree2 AS d1 LEFT JOIN items AS d2 ON d1.child_2 = d2.parent
), degree4 AS (
  SELECT d1.*, d2.child AS child_4 
  FROM degree3 AS d1 LEFT JOIN items AS d2 ON d1.child_3 = d2.parent
)
SELECT STRING_AGG(dress(parent, child_1), ',') AS parent FROM (
SELECT parent, STRING_AGG(dress(child_1, child_2), ',') AS child_1 FROM (
SELECT parent, child_1, STRING_AGG(dress(child_2, child_3), ',') AS child_2 FROM (
SELECT parent, child_1, child_2, STRING_AGG(dress(child_3, child_4), ',') AS child_3 FROM (
SELECT parent, child_1, child_2, child_3, STRING_AGG(dress(child_4, NULL), ',') AS child_4 FROM degree4
GROUP BY 1,2,3,4 ORDER BY 1,2,3,4 )
GROUP BY 1,2,3 ORDER BY 1,2,3 )
GROUP BY 1,2 ORDER BY 1,2 ) GROUP BY 1 ORDER BY 1 )  

It returns exactly what you need - see "beautified" version of it below

{"name": "Bob","size": 12,"children": [
    {"name": "Alice","size": 1},
    {"name": "Carol","size": 1},
    {"name": "John","size": 8,"children": [
        {"name": "Peter","size": 4,"children": [
            {"name": "Sam","size": 3,"children": [
                {"name": "Mike","size": 1},
                {"name": "Nick","size": 1} ]}
          ]},
        {"name": "Sam","size": 3,"children": [
            {"name": "Mike","size": 1},
            {"name": "Nick","size": 1} ]}
      ]},
    {"name": "Mary","size": 1}
  ]},
{"name": "John","size": 8,"children": [
    {"name": "Peter","size": 4,"children": [
        {"name": "Sam","size": 3,"children": [
            {"name": "Mike","size": 1},
            {"name": "Nick","size": 1} ]}
      ]},
    {"name": "Sam","size": 3,"children": [
        {"name": "Mike","size": 1},
        {"name": "Nick","size": 1} ]}
  ]},
{"name": "Peter","size": 4,"children": [
    {"name": "Sam","size": 3,"children": [
        {"name": "Mike","size": 1},
        {"name": "Nick","size": 1} ]}
  ]},
{"name": "Sam","size": 3,"children": [
    {"name": "Mike","size": 1},
    {"name": "Nick","size": 1} ]}

Most likely, above can be further generalized - but I thought it is already good enough for you to try :o)

2
votes

You can have a JavaScript UDF make recursive calls, but it can't execute another SQL statement. If you know the number of recursions/iterations in advance, it may be possible to define a SQL function instead, such as:

#standardSQL
CREATE TEMP FUNCTION SumToN(x INT64) AS (
  (SELECT SUM(v) FROM UNNEST(GENERATE_ARRAY(1, x)) AS v)
);

Using GENERATE_ARRAY, you can create a for loop of the desired length. Here's another example that doesn't involve a UDF, but uses GENERATE_ARRAY to concatenate a variable number of strings:

#standardSQL
WITH T AS (
  SELECT 2 AS x, 'foo' AS y UNION ALL
  SELECT 4 AS x, 'bar' AS y)
SELECT
  y,
  (SELECT STRING_AGG(CONCAT(y, CAST(v AS STRING)))
   FROM UNNEST(GENERATE_ARRAY(1, x)) AS v) AS rep_y
FROM T;
+-----+---------------------+
| y   | rep_y               |
+-----+---------------------+
| foo | foo1,foo2           |
| bar | bar1,bar2,bar3,bar4 |
+-----+---------------------+