9
votes

I have a PostgreSQL table that has some fields indexed and those have to be unique in order to prevent duplicates. This is made thanks to a PLPGSQL function that inserts all the fields and catches the unique_violation exception, altough it stops inserting the records even if there's just one duplicate.

I can't make several INSERTs due to performance issues (some of these are done in hundreds), the issue is that it stops all the process even if there's just one duplicate, like in the firest two values in the following example.

CREATE OR REPLACE FUNCTION easy_import() RETURNS VOID AS
  $$
  BEGIN
    BEGIN
      INSERT INTO things ("title", "uniq1", "uniq2") VALUES
      ('title 1', 100, 102),
      ('title 2', 100, 102),
      ('title 3', 101, 102),
      ('title 4', 102, 102),
      ('title 5', 103, 102),
      ('title 6', 104, 102),
      ('title 7', 105, 102),
      ('title 8', 106, 102),
      ('title 9', 107, 102),
      ('title 10', 108, 102);
      RETURN;
    EXCEPTION WHEN unique_violation THEN
      -- do nothing
    END;
  END;
  $$
  LANGUAGE plpgsql;

Is there a way to ignore the unique_violation just for one record and prevent it from stopping further INSERTs?

Thank you.

Update

  • The unique index has it on the "uniq1" and "uniq2" fields, I'm sorry about the confusion.
  • While @cdhowie's solution seems to be the best, it somehow ignores the fact that if you run the same query, it will trigger an error. It's weird, because the query does the JOIN for a reason. Still working on it.
4
What is the unique constraint here? (uniq1, uniq2)? - cdhowie
@cdhowie yes, those are the fields on which I created the unique index. - metrobalderas

4 Answers

6
votes

Assuming that the unique constraint is composite around uniq1 and uniq2, this will work:

INSERT INTO things

WITH new_rows (title, uniq1, uniq2) AS (VALUES
    ('title 1', 100, 102),
    ('title 2', 100, 102),
    ('title 3', 101, 102),
    ('title 4', 102, 102),
    ('title 5', 103, 102),
    ('title 6', 104, 102),
    ('title 7', 105, 102),
    ('title 8', 106, 102),
    ('title 9', 107, 102),
    ('title 10', 108, 102)
)

SELECT
    DISTINCT ON (n.uniq1, n.uniq2)
    n.title, n.uniq1, n.uniq2

FROM new_rows AS n

LEFT JOIN things AS t
ON n.uniq1 = t.uniq1 AND n.uniq2 = t.uniq2

WHERE t.uniq1 IS NULL;

This may actually wind up being less performant than individual INSERT statements, but it's about the only other thing that will do the trick. Benchmark each approach and see which works best for you.

3
votes

Your table is like to this:

CREATE TABLE t
(
  title text,
  uniq1 int not null,
  uniq2 int nut null,
  CONSTRAINT t_pk_u1_u2 PRIMARY KEY (uniq1,uniq2)
)

so let me add a rule to that:

CREATE OR REPLACE RULE ignore_duplicate_inserts_on_t AS ON INSERT TO t
   WHERE (EXISTS ( SELECT 1 FROM t WHERE t.uniq1 = new.uniq1 and t.uniq2 = new.uniq2))
   DO INSTEAD NOTHING;

and after that, you can run this query:

insert into t(title,uniq1,uniq2) values 
    ('title 1', 100, 102),
    ('title 2', 100, 102),
    ...;

if your table be large, this way is optimal. I have had a test (for this way and the join way that mentioned above by Mr. cdhowie) on table with about 2 million rows, the result is:

Rule way (mentioned in this comment): 1400 rows per second
Join way (mentioned in above comment): 650 rows per second
2
votes

To get the behavior you are looking for, just split the INSERT statement into individual statement instead of having multiple values in the one statement. There should't be any noticeable difference in performance compared to your original version since everything is still in a single transaction.

CREATE OR REPLACE FUNCTION easy_import() RETURNS VOID AS
$$
BEGIN
BEGIN
  INSERT INTO things ("title", "uniq1", "uniq2") VALUES ('title 1', 100, 102);
EXCEPTION WHEN unique_violation THEN
 -- do nothing
END;

BEGIN
  INSERT INTO things ("title", "uniq1", "uniq2") VALUES ('title 2', 100, 102);
EXCEPTION WHEN unique_violation THEN
 -- do nothing
END;

BEGIN
  INSERT INTO things ("title", "uniq1", "uniq2") VALUES ('title 3', 101, 102);
EXCEPTION WHEN unique_violation THEN
 -- do nothing
END;

BEGIN
  INSERT INTO things ("title", "uniq1", "uniq2") VALUES ('title 4', 102, 102);
EXCEPTION WHEN unique_violation THEN
 -- do nothing
END;

BEGIN
  INSERT INTO things ("title", "uniq1", "uniq2") VALUES ('title 5', 103, 102);
EXCEPTION WHEN unique_violation THEN
 -- do nothing
END;

BEGIN
  INSERT INTO things ("title", "uniq1", "uniq2") VALUES ('title 6', 104, 102);
EXCEPTION WHEN unique_violation THEN
 -- do nothing
END;

BEGIN
  INSERT INTO things ("title", "uniq1", "uniq2") VALUES ('title 7', 105, 102);
EXCEPTION WHEN unique_violation THEN
 -- do nothing
END;

BEGIN
  INSERT INTO things ("title", "uniq1", "uniq2") VALUES ('title 8', 106, 102);
EXCEPTION WHEN unique_violation THEN
 -- do nothing
END;

BEGIN
  INSERT INTO things ("title", "uniq1", "uniq2") VALUES ('title 9', 107, 102);
EXCEPTION WHEN unique_violation THEN
 -- do nothing
END;

BEGIN
  INSERT INTO things ("title", "uniq1", "uniq2") VALUES ('title 10', 108, 102);
EXCEPTION WHEN unique_violation THEN
 -- do nothing
END;

RETURN;
END;
$$
LANGUAGE plpgsql;
1
votes

Nope.

The BEGIN ... EXCEPTION block is a subtransaction, and when a statement within the block fails the subtransaction is rolled back. It's all or nothing.

Use cdhowie's approach - a JOIN on the VALUES.