1
votes

We are migrating a legacy project with huge data set (100+ Tables and many of them are 10 million+ records). Few of them are huge (around 100 million).

When we switch the application, we will get only couple of hours to migrate this data to new application which has same tables but slightly different structure (so cannot import or pump).

We are inserting the records as follows:

  1. Created a DB Link from source to Destination database
  2. DISABLE all Constraints on destination table (and child tables if any)
  3. ALTER all indexes as UNUSABLE
  4. INSERT /+ APPEND NOLOGGING PARALLEL/ INTO destination_table SELECT /*+ PARALLEL */ FROM source_table
  5. REBUILD all indexes (REBUILD PARALLEL NOLOGGING and then change to NOPARALLEL and LOGGING)
  6. ENABLE all Constraints on destination table (and child tables if any)

It works well but takes longer to finish (avg 5 mins for around 10 million records). Most of the time is spent in STEP 6 and specially for FOREIGN KEY constraints where both Parent and Child are huge.

How can we optimize the insert performance of our database as described above?

3
Are you doing Step 6 (enable constraint) in parallel also? - btpys
@btpys We do not enable constraint in parallel explicitly. But are those not done in parallel by default? Do you mean Enable novalidate and then validate the constraint? - Nishant Hadole
No, I mean altering the session before enabling the constraints: ALTER SESSION FORCE PARALLEL DDL PARALLEL 8; - btpys
@btpys Sure will give it a try. Also There are few non-cached sequences which I think can improve some performance. - Nishant Hadole
Yes that helped. Now we are able to insert around 593 million records in 84 minutes, which is quite acceptable. - Nishant Hadole

3 Answers

3
votes

Happy to receive different approaches however we will be refining our approach (as we already have working solution). Being said that this might not "BEST" way for sure but it addresses our concerns in this specific case.

So the steps we finally followed are;

  1. Disable all Constraints on Destination Table (and Child tables if any)
  2. Disable all Indexes on Destination Table
  3. Increase Cache of Sequences if used
  4. Perform INSERT (hint APPEND & NOLOGGING) INTO SELECT (hint PARALLEL)
  5. Rebuild all Indexes on Destination Table with PARALLEL NOLOGGING
  6. Alter Indexes to change to NOPARALLEL & LOGGING
  7. Enable all Constraints using NOVALIDATE (and Child tables if any)
  8. ALTER TABLE TABLE_NAME PARALLEL
  9. Enable all Constraints using VALIDATE
  10. ALTER TABLE TABLE_NAME NOPARALLEL

Above steps are repeated for all tables in a script which starts with

ALTER SESSION ENABLE PARALLEL DDL

Tables: 99

Records: 593,960,688

Time taken: 01:23:44 Hrs

2
votes

Well if you're looking for performance, you're going to have a problem piping data over a db_link.

Unload the data to flat files, then setup external table/CTAS or SQL Loader to load the data into the new system. It will be significantly faster.

A very small example showing the type of rates I was able to achieve on just a laptop and a VM.

TL/DR;

112,000 rows per second via SQL Loader...on a windows laptop.

0
votes

If you only get two hours to migrate a huge project, you can do the following:

  1. Make the data read-only after step 4 (perhaps using a separate tablespace for the project, then do a alter tablespace new_stuff read only; )
  2. Let the users into the application. The users can view the data (possibly slowly until step 5 is done)
  3. After step 6 is done, alter the tablespace to be read write.