0
votes

As of June 2nd 2020 BigQuery supports referencing tables and views in UDF. However, the obvious use case (for me) reverts an error that reads: "Correlated subqueries that reference other tables are not supported unless they can be de-correlated, such as by transforming them into an efficient JOIN."

My query is something like this:

SELECT airport, udf_dataset.getcity(airport) AS city
FROM my_table

And the UDF getcity looks like this:

SELECT city_code FROM my_cities_table
WHERE airport_code = airport_input

The my_cities_table has a 1 on 1 reference of what is the city for each airport.

My guess is that doing a JOIN efficiently gets the city value for each airport, whereas the UDF queries for each airport value, creating the correlation. Is my guess right? If so, then supporting table reference in UDF proves less worthy.

1
I just tried and it perfectly worked for me with both temp and permanent UDF. Might be issue somewhere else in your query. Present more details, so we can further help youMikhail Berlyant
It's as direct as that, I'm not simplifying the queries. my_cities_table has about 10k rows and my_table about 2 million, maybe their size has to do with it, especially because I tried with a UDF referencing a table with 40 records and it passes. And even if there was no error, do you see correlating queries there?PBurns
Would you mind sharing some data ? So I can investigate further. I was not able to reproduce your error without having the same/similar data as you.Alexandre Moraes
I did find something else that supports the hypothesis of the sizes. In my previous comment I mention that trying with a UDF that references a table with 40 records passes. That UDF filters a partition in the WHERE clause that narrows the lookup table to 40 records. If I remove the partition filter and use the LIMIT 1 statement, the size of the lookup table es about 35k and the correlation errors comes up. In the submitted examples my_table has 40+ columns and my_cities_table has 10 columns, but I'm only selecting the fields shown, so I don't think that helps.PBurns
I have another clue: Google's documentation says in the limitations section that "Maximum number of unique UDF plus table references per query — 1000. After full expansion, each UDF can reference up to 1000 combined unique tables and UDFs." This may be related, but don't know how to test it. How do I know how many unique UDF plus table references my query has? What exactly is a full expansion?PBurns

1 Answers

1
votes

After some iterations I got an answer from Google and although it doesn't resolve the issue, it does clarify what happens.

A UDF should never be used instead of a JOIN operation, even when the UDF consists on a JOIN operation itself. That creates correlated queries and a staight JOIN is simply more efficient. I was trying to do that in a frequently used conversion, so the code could've been shorter this way.

Regarding the maximum of 1000 unique UDF plus table references per query after full expansion, that refers to the resulting tables when there are wildcards involved in the FROM clause, but in my initial example the count is straightforward.

I hope this helps somebody that was trying to do this as I was.