According to the postgres docs, one process in a transaction should
not "see" changes made by another process in a transaction until they
are committed.
Yes and No - as usual, it depends. The documentation strictly says that:
Read Committed is the default isolation level in PostgreSQL.
When a transaction uses this isolation level, a SELECT query (without
a FOR UPDATE/SHARE clause) sees only data committed before the query
began; it never sees either uncommitted data or changes committed
during query execution by concurrent transactions. In effect, a SELECT
query sees a snapshot of the database as of the instant the query
begins to run. However, SELECT does see the effects of previous
updates executed within its own transaction, even though they are not
yet committed. Also note that two successive SELECT commands can see
different data, even though they are within a single transaction, if
other transactions commit changes after the first SELECT starts and
before the second SELECT starts.
UPDATE, DELETE, SELECT FOR UPDATE, and SELECT FOR SHARE commands
behave the same as SELECT in terms of searching for target rows: they
will only find target rows that were committed as of the command start
time. However, such a target row might have already been updated (or
deleted or locked) by another concurrent transaction by the time it is
found. In this case, the would-be updater will wait for the first
updating transaction to commit or roll back (if it is still in
progress). If the first updater rolls back, then its effects are
negated and the second updater can proceed with updating the
originally found row. If the first updater commits, the second updater
will ignore the row if the first updater deleted it, otherwise it will
attempt to apply its operation to the updated version of the row. The
search condition of the command (the WHERE clause) is re-evaluated to
see if the updated version of the row still matches the search
condition. If so, the second updater proceeds with its operation using
the updated version of the row. In the case of SELECT FOR UPDATE and
SELECT FOR SHARE, this means it is the updated version of the row that
is locked and returned to the client.
In other word, simply SELECT differs from SELECT FOR UPDATE/DELETE/UPDATE.
You can create simple test case to observe that behaviour:
Session 1
test=> START TRANSACTION;
START TRANSACTION
test=> SELECT * FROM test;
x
----
1
2
3
4
5
6
7
8
9
10
(10 rows)
test=> DELETE FROM test;
DELETE 10
test=>
Now login in another Session 2:
test=> START TRANSACTION;
START TRANSACTION
test=> SELECT * FROM test;
x
----
1
2
3
4
5
6
7
8
9
10
(10 rows)
test=> SELECT * FROM test WHERE x = 5 FOR UPDATE;
After the last command SELECT ... FOR UPDATE session 1 "hangs" and is waiting for something ......
Back in session 1
test=> insert into test select * from generate_series(1,10);
INSERT 0 10
test=> commit;
COMMIT
And now when you go back to session 2 you will see this:
test=> SELECT * FROM test WHERE x = 5 FOR UPDATE;
x
---
(0 rows)
test=> select * from test;
x
----
1
2
3
4
5
6
7
8
9
10
(10 rows)
That is - simple SELECT still doesn't see any changes, while SELECT ... FOR UPDATE does see that rows have been deleted. But it doesn't see new rows inserted by session 1
In fact a sequence you are seeing is:
- process A starts its transaction
- process A deletes everything from table T
- process B starts its transaction
- process B attempts a select for update on one row in table T
- process B "hangs" and is waiting until session A does a commit or rollback
- process A repopulates table T from incoming data
- process A commits its transaction
- process B comes up empty (0 rows- after session A commit) and calls rollback