Postgres: best solution to ignore errors during bulk insert

Question

I've seen bunch of similar questions here. But my question, i guess, specific.

This is best solution i've found for current version of Postgres (9.4): http://www.depesz.com/2012/06/10/why-is-upsert-so-complicated/ http://www.postgresql.org/docs/current/static/plpgsql-control-structures.html#PLPGSQL-UPSERT-EXAMPLE

Generally, the suggested solution is based on algorithm: insert one row, if error happened - do something, if not - insert next one.

If all i need is to ignore duplicate errors, can I just do this?

INSERT INTO tablename (id,name,surname) values (1,'john','doe')
INSERT INTO tablename (id,name,surname) values (2,'jane','smith')

instead of this:

INSERT INTO tablename (id,name,surname) values (1,'john','doe'),(2,'jane','smith')

...if I have to insert only ~5-30 rows at once? so, some of inserted rows just return duplicate errors, but the rest of them will be executed successfully. This is all I need, actually.

I've tried to compare the cost of those approaches by EXPLAIN INSERT 100 rows at once and 100 rows separately. Probably, I'm doing something wrong, because it shows ~25-50 times lesser execution time, when I'm inserting rows separately, like in first example:

INSERT INTO tablename (id,name,surname) values (1,'john','doe')
INSERT INTO tablename (id,name,surname) values (2,'jane','smith')

Here is query wrapper i was using:

BEGIN;
EXPLAIN ANALYZE
-- query or queriES here
ROLLBACK;

So, the question is, why am I receiving this? May be, EXPLAIN shows execution time for each row instead of the whole query? Then it makes sense: in this case, bulk insert will cost ~3 times less than separated commands. Right?

If you choose not to go with bulk insert, is your business logic prepared to handle the situation where, say half, of your records were inserted but the other half were not? — Tim Biegeleisen
absolutely! if it will not make the query execution time, say, 50 times longer, then separate inserting is exactly what i need — stkvtflw
So you are saying that execution time took 25-50 times less for say 100 insertions separately versus 100 in bulk? — Tim Biegeleisen
I think, my results are wrong, because it's illogical. Could you try this yourself? It should not take more than few minutes — stkvtflw
I can't account for your EXPLAIN results, but you probably don't want to be doing individual inserts in any case, q.v. this Stack Exchange DBA article. — Tim Biegeleisen

Clodoaldo Neto Clodoaldo Neto · Accepted Answer · 2015-12-28T07:40:36

with i (id, name, surname) as (values (1,'john','doe'),(2,'jane','smith'))
insert into t (id, name, surname)
select id, name, surname
from i
where not exists (
   select 1
   from t
   where id = i.id and name = i.name and surname = i.surname
)

There is a race condition above if data can be inserted in any other way. Just retry if an error occurs.

Postgres: best solution to ignore errors during bulk insert

1 Answers