1
votes

This is a variation on plpgsql function that returns multiple columns gets called multiple times. However, I was hoping to find a solution to my particular set of circumstances.

I have a function that processes an array of rows with a given parameter, and returns a set of rows + a new column.

CREATE OR REPLACE foo(data data[], parameter int) RETURNS SETOF enhanceddata AS
...

The function works on a test case with only 1 set of data

SELECT * FROM foo( (SELECT ARRAY_AGG(data) FROM datatable GROUP BY dataid WHERE dataid = something), 1) 

But I would like to make it work with multiple groups of data, without passing a dataid to the function. I tried a number of variations of:

SELECT dataid, (foo(ARRAY_AGG(data)),1).*
FROM dataset
WHERE dataid = something -- only testing on 1
GROUP BY dataid

But the function gets called once for every column.

2
And enhanceddata is a composite type, I presume? The definition of which would be essential to your question. Because the problem you describe only applies if multiple columns are returned (not for multiple rows).Erwin Brandstetter

2 Answers

4
votes

In Postgres 9.3 or later, it's typically best to use LEFT JOIN LATERAL ... ON true:

SELECT sub.dataid, f.*
FROM  (
   SELECT dataid, array_agg(data) AS arr
   FROM   dataset
   WHERE  dataid = something
   GROUP  BY 1
   ) sub
LEFT   JOIN LATERAL foo(sub.arr) f ON true;

If the function foo() can return no rows, that's the safe form as it preserves all rows to the left of the join, even when no row is returned to the right.

Else, or if you want to exclude rows without result from the lateral join, use:

CROSS JOIN LATERAL foo(sub.arr)

or the shorthand:

, foo(sub.arr)

There is an explicit mention in the manual.

Craig's related answer (referenced by Daniel) is updated accordingly:

1
votes

The function is called multiple times in this context not because of its inputs, but because of how func().* is implemented

This is explained at: How to avoid multiple function evals with the (func()).* syntax in an SQL query?

The following variant should work without multiple evals on all supported PostgreSQL versions (8.4 or newer):

WITH subq as (
  SELECT array_agg(data) as agg,
   dataid FROM datatable
   -- WHERE clause ?
   GROUP BY dataid)
SELECT foo(agg,dataid) FROM subq;