3
votes

I'm working on a PL/pgSQL function "interpolate_values" with some time-consuming computations. A table called "interpolation_jobs" contains surveillance information about every function call, such that the progress of a function call with a given job_id should be observable by

SELECT status FROM interpolation_jobs WHERE id = job_id;

The column "status" contains one of the values 'queued', 'running' or 'done'. At the beginning of the function, the status is changed from 'queued' to 'running', at the end it is set to 'done':

CREATE OR REPLACE FUNCTION interpolate_values (job_id INTEGER)
RETURNS VOID
LANGUAGE plpgsql VOLATILE
AS $$
DECLARE
BEGIN
    EXECUTE 'UPDATE interpolation_jobs
        SET status = ''running'', progress = 0.0
        WHERE id = ' || job_id || ';';

--
-- ... some extensive database computations ...
--

    EXECUTE 'UPDATE interpolation_jobs
        SET status = ''done''
        WHERE id = ' || job_id || ';';

END;
$$;

My problem is that the status is not updated during the function call. The updates actually take place when the function call returns. So, the status directly changes from 'queued' to 'done'. The lines

    EXECUTE 'UPDATE interpolation_jobs
        SET status = ''running'', progress = 0.0
        WHERE id = ' || job_id || ';';

have no effect.

Is there a possibilty to update the values immediately in PL/pgSQL, so that new values are accessible before the function call returns?

Thanks!

EDIT:

Thank you for all your answers which helped me a lot to understand the general problems of asynchronous database operations. The dblink approach works for me. It is not necessary to specify IP/Port/User if the same database is used:

SELECT * FROM current_database() INTO _db_name;
PERFORM dblink_connect('dbname=' || _db_name);
PERFORM dblink_exec('UPDATE interpolation_jobs SET status = ''running'' WHERE id =' || _job_id);

-- 
-- ... some extensive database computations ...
--

PERFORM dblink_exec('UPDATE interpolation_jobs SET status = ''done'' WHERE id =' || _job_id);
PERFORM dblink_disconnect();
2
In short: no. The updates to the status-table are part of the transaction and can only be seen by other processes once the transaction has been committed. You would need asynchroneous notification to accomplish what you want. see 'LISTEN' or 'NOTIFY' in the fine manual.wildplasser

2 Answers

4
votes

What you appear to want is a dirty read or dirty write. These are not available in PostgreSQL and are not likely to ever be supported.

A close relative is an autonomous transaction. Again, these are not supported by PostgreSQL though work to add them is in progress.

You will find that even with autonomous transactions it's very hard to write your own efficient queuing system. Don't go down this path, use one someone has already written. Queueing systems are hard to write well, particularly against a RDBMS. Existing systems like ZeroMQ, RabbitMQ, PGQ, etc may be well worth evaluating as alternatives.

There's been recent discussion of adding a feature to SELECT ... FOR UPDATE that would let PostgreSQL skip locked rows and grab the first row that is not locked. This feature is not yet available and will not be available until 9.4 at the absolute earliest, so don't hold your breath.

3
votes

You can use dblink to connect to your database and perform a query that will be immiedietely commited:

CREATE OR REPLACE FUNCTION interpolate_values (_job_id INTEGER)
RETURNS VOID
LANGUAGE plpgsql VOLATILE
AS $$
DECLARE
_conn TEXT;
_status TEXT;
BEGIN
    _conn:='hostaddr=127.0.0.1 port=5433 dbname=<db> user=<user> password=<pass>';
    _status='running';
    PERFORM dblink_exec(_conn,'UPDATE interpolation_jobs SET status = '''||_status||''', progress = 0.0 WHERE id ='||_job_id);
    PERFORM pg_sleep(10); --simulate some time consuming calculations
    _status='finished';
    PERFORM dblink_exec(_conn,'UPDATE interpolation_jobs SET status = '''||_status||''', progress = 100.0 WHERE id ='||_job_id);
END;
$$;