5
votes

I'm currently trying to insert in batch many records (~2000) and Jooq's batchInsert is not doing what I want.

I'm transforming POJOs into UpdatableRecords and then I'm performing batchInsert which is executing insert for each record. So Jooq is doing ~2000 queries for each batch insert and it's killing database performance.

It's executing this code (jooq's batch insert):

for (int i = 0; i < records.length; i++) {
            Configuration previous = ((AttachableInternal) records[i]).configuration();

            try {
                records[i].attach(local);
                executeAction(i);
            }
            catch (QueryCollectorSignal e) {
                Query query = e.getQuery();
                String sql = e.getSQL();

                // Aggregate executable queries by identical SQL
                if (query.isExecutable()) {
                    List<Query> list = queries.get(sql);

                    if (list == null) {
                        list = new ArrayList<Query>();
                        queries.put(sql, list);
                    }

                    list.add(query);
                }
            }
            finally {
                records[i].attach(previous);
            }
        }

I could just do it like this (because Jooq is doing same thing internally):

records.forEach(UpdatableRecord::insert);

instead of:

jooq.batchInsert(records).execute();

How can I tell Jooq to create new records in batch mode? Should I transform records into bind queries and then call batchInsert? Any ideas? ;)

1

1 Answers

4
votes

jOOQ's DSLContext.batchInsert() creates one JDBC batch statement per set of consecutive records with identical generated SQL strings (the Javadoc doesn't formally define this, unfortunately).

This can turn into a problem when your records look like this:

+------+--------+--------+
| COL1 | COL2   | COL3   |
+------+--------+--------+
| 1*   | {null} | {null} |
| 2*   | B*     | {null} |
| 3*   | {null} | C*     |
| 4*   | D*     | D*     |
+------+--------+--------+

.. because in that case, the generated SQL strings will look like this:

INSERT INTO t (col1) VALUES (?);
INSERT INTO t (col1, col2) VALUES (?, ?);
INSERT INTO t (col1, col3) VALUES (?, ?);
INSERT INTO t (col1, col2, col3) VALUES (?, ?, ?);

The reason for this default behaviour is the fact that this is the only way to guarantee ... DEFAULT behaviour. As in SQL DEFAULT. I gave a rationale of this behaviour here.

With this in mind, and as each consecutive SQL string is different, the inserts unfortunately aren't batched as a single batch as you intended.

Solution 1: Make sure all changed flags are true

One way to enforce all INSERT statements to be the same is to set all changed flags of each individula record to true:

for (Record r : records)
    r.changed(true);

Now, all SQL strings will be the same.

Solution 2: Use the Loader API

Instead of batching, you could import the data (and specify batch sizes there). For details, see the manual's section about importing records:

https://www.jooq.org/doc/latest/manual/sql-execution/importing/importing-records

Solution 3: Use a batch statement instead

Your usage of batchInsert() is convenience that works when using TableRecords. But of course, you can generate an INSERT statement manually and batch the individual bind variables by using jOOQ's batch statement API:

https://www.jooq.org/doc/latest/manual/sql-execution/batch-execution

A note on performance

There are a couple of open issues regarding the DSLContext.batchInsert() and similar API. The client side algorithm that generates SQL strings for each individual record is inefficient and might be changed in the future, relying on changed() flags directly. Some relevant issues: