UUID as primary key (slow inserts)

votes

I am using a UUID (36-chars) as a primary key in a table. Since we hit about 150mio records the inserts become very slow (seems to be due to the PK index).

Is there any way to improve that situation? Or do you guys have any idea?

Details:

Row length average of the 150 M rows.

About 60 chars in total (10 columns, 8 are just IDs)
Definition of slow

800-1000 rows per second
After how many rows loaded and at which moment does it become slow.

afaik first one
How are you loading (SQL*Loader? SQL? Own code? Threads?)

Informatica Powercenter
Exact Oracle version (select * from v$version)

Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production

oracleoracle11goracle10g

@MikeW, INSERT DELAYED is MySQL-only feature - but this question is tagged Oracle - mvp

What leads you to suspect the PK index? Have you examined a trace file for a session doing a lot of inserts? - Jeffrey Kemp

Are you invoking Oracle's direct path insert with Powercenter, and if you think you are have you checked the logs to confirm that you are? As I recall there were about 3 different ways of accidentally not using it. - David Aldridge

hi @user2428207 has any of these answers been helpful to solve your problem? If not please add what was not successful to your question. When an answer helped you, can you accept it by clicking the hollow green check mark next to it? - Guido Leenders

2 Answers

votes

First of all: 800-1000 rows per second - not that slow, especially from outer source. I guess you insert rows one-by-one and Oracle parses it every time and you have delays between every insert then.

About increasing performance:

A. Have a look if you have more indexes on your table.

Especially look for bitmap indexes because they are quite hard to be rebuilt.

Also you can easily check if a primary key is really a problem:

alter table <table> drop constraint <primary key>;

B. Have a look if you have "materialized view log" + "materialize view refresh on commit" on this table. This also can hurt very much.

If you want to achieve the fastest performance do not use indexes on a table at all and create a second one which is refreshed in shadow. Maybe, there is a point to use materialized view for this.

votes

A primary key of 36 characters should not be an issue to really slow down inserting rows. A primary key is enforced using a index with unique values. 36 characters plus some overhead plus something extra for AL32UTF8 when used, still allows with 8 KB block size to fit many values in one index leaf block.

There is probably something else going on.

Please define:

Row length average of the 150 M rows.
Definition of slow
After how many rows loaded and at which moment does it become slow.
How are you loading (SQL*Loader? SQL? Own code? Threads?)
Exact Oracle version (select * from v$version)

Dropping index helped

From the additions, I've understood that after the dropping of the sole index (PK index), performance was reasonable (22K/s).

You might want to check the following:

Check what backend PowerCenter is using, depends on ETL job, version and possibilities of platform. Maybe it is insert/select, parallel yes/no or even extract and reload. For simplicity let's assume it is some parallel select and insert.
11.2.0.3 is a good Oracle release with rather few annoying bugs.
Depending on how the DBA configured space management, look at initrans and maxtrans of the PK index. First one determines number of transactions allowed to be active. Set initrans at for instance the number of concurrent inserts.
Check that the statistics on the data dictionary are either totally absent or up-to-date (look in dba_tables for instace).
Check that the statistics on your own tables are absent or reasonably accurate (alter table xxx compute statistics for table for all columns for all indexes). Repeat this step after each major change in the amount and/or distribution of data loaded.
Sometimes and especially on initial loading, the index might become too much fragmented. Maybe the values generated for UUID are contributing to this. Using a query you can detect this, but to keep it simple: always rebuild the index after initial loading table. Over time, the changes get smaller, going from 1 row to 1.000.000 is a big chagen, going from 1 million to 2 million isn't. You should need to do this only once now.

As a side note: 60 characters for a row with a 36 character column as primary key seems unrealistic small. Each ID takes approximately precision / 2 + 1 bytes (BCD format). But even when the row size would be double, it should not have a major impact.

Detecting indexes benefiting from rebuild

Addition upon request:

The following query is what I use to determine indexes eligible for rebuilding. Criteria at the end can be changed. For production use in an OLTP environment they are sufficient to sufficiently reduce unnecessary messages but still detect the real bad cases. Run as privileged user or grant read access to relevant data dictionary views. Should run on Oracle 8, 8i, 9 an 10. Not recently used on 11 or 12.

I know that directly querying data dictionary is not always desirable, for instance when using multiple editions of a data model, but this performs considerably better. Indexes benefiting from rebuild can be either benefit through storage (in the dark age we you only had a few MB on VMS server each KB is valuable) and sometimes better performance.

ttitle "Warning: Fragmented Indexes (&dbname)" -
      skip 2 "Corrective action:" -
      skip 1 "Rebuild index (rebuildIndexes.sql)." -
      skip 2

column owner       format a30         heading "Owner"
column ind_name    format a30         heading "Index"
column num_rows    format 999,999,990 heading "#Rows"
column cur_size_kb format 9,999,990   heading "Current Size (Kb)"
column est_size_kb format 9,999,990   heading "Est. Size (Kb)"
column ratio       format 9,990       heading "Ratio cur/est"

select /*+ rule */ usr.name owner
,      objind.name ind_name
,      round
       (
         ( sum
           ( (tab.rowcnt - head.null_cnt) * col.length * 0.5
             /* 50% average fill */
           )
           * 1.5 /* 75% usage after some delet/update */
           + 5 * ts.blocksize
         ) /1024
       ) est_size_kb
,      round(ts.blocksize * seg.blocks / 1024) cur_size_kb
,      round
       ( ts.blocksize * seg.blocks
         / ( sum((tab.rowcnt - head.null_cnt) * col.length * 0.5) * 1.5
             + 5 * ts.blocksize
           )
       ) ratio
,      tab.rowcnt num_rows
from   sys.ind$ ind
,      sys.hist_head$ head
,      sys.col$ col
,      sys.icol$ icol
,      sys.obj$ objind
,      sys.obj$ objtab
,      sys.tab$ tab
,      sys.ts$  ts
,      sys.seg$ seg
,      sys.user$ usr
where  1=1
and    ts.ts# = seg.ts#
and    seg.file# = ind.file#
and    seg.block# = ind.block#
and    ind.obj# = objind.obj#
and    head.col# (+) = col.col#
and    head.obj# (+) = col.obj#
and    icol.obj# = ind.obj#
and    col.col# = icol.col#
and    col.obj# = objtab.obj#
       /* To save at least 25 Mb, the index must be over 25 Mb. */
and    ts.blocksize * seg.blocks > 25 * 1024
and    ind.bo# = objtab.obj#
and    objind.obj# = ind.obj#
and    tab.obj# = objtab.obj#
and    objtab.owner# = usr.user#
and    usr.name not in ('SYS','SYSTEM')
group
by     objind.name
,      ts.blocksize
,      seg.blocks
,      tab.rowcnt
,      usr.name
having ts.blocksize * seg.blocks / ( sum((tab.rowcnt - head.null_cnt) * col.length * 0.5) * 1.5 + 5 * ts.blocksize )
       >= 2
and    ( ts.blocksize * seg.blocks / 1024 )
       -
       ( sum((tab.rowcnt - head.null_cnt) * col.length * 0.5) * 1.5
         + 5 * ts.blocksize
       ) / 1024
       > 25 * 1024
order by 5 desc