0
votes

I wrote a function to get the net revenue for an order:

CREATE OR REPLACE FUNCTION
  `project.dataset.function`(inputBuyerCurrency STRING,
    inputCountryCode STRING,
    inputPrice FLOAT64,
    inputOrderTimestamp TIMESTAMP)
  RETURNS FLOAT64 AS ( (
    SELECT
      P
    FROM (
      SELECT
        AVG(APP.Proceeds) AS P,
        App.InsertedTimestamp
      FROM
        `project.dataset.lookup` APP
      LEFT JOIN
        `project.dataset.countrycode_lookup` CC
      ON
        APP.Region = CC.Alpha3
      WHERE
        CC.Alpha2 = inputCountryCode
        AND App.Currency = inputBuyerCurrency
        AND App.Price = inputPrice / 100
        AND App.InsertedTimestamp <= inputOrderTimestamp
      GROUP BY
        App.InsertedTimestamp
      ORDER BY
        InsertedTimestamp DESC
      LIMIT
        1)) );

The parameters are self explanatory; the currency in which the order was placed, the country of the buyer, the price and the timestamp the order was placed.

The project.dataset.lookup-table gets updated every day with the new NetRevenue-values. What I want is to check which was the last matching row in the project.dataset.lookup before the order was placed. So a order of 2021-01-01 22:00:00 (for example) should match something with an InsertTimestamp of 2021-01-01 01:00:00.

When I 'call' the function with hard coded values, no problems occur. I get the result I want. However, when I try to call the function with values from a table, I get the following error: Correlated subqueries that reference other tables are not supported unless they can be de-correlated, such as by transforming them into an efficient JOIN..

So, this works:

SELECT `project.dataset.function`("EUR", "NL", 1999, 1626269392)

But this doesn't:

SELECT `project.dataset.function`(CurrencyCode, CountryCode, Price, Timestamp)
FROM `project.dataset.order`

Thanks a lot in advance.

1

1 Answers

0
votes

Imagine your order table is 1M rows... your function would execute its query 1M times, once per row, which is insane. This is the correlated subquery. When you hard-code the values into your function, it really only is running 1 time.

Consider making your "subquery" into a view, and include columns to join back to your order table.