How to optimize inserting rows and creating indexes?

Question

I have a table with over a billion rows. I want to basically have a copy of this table, however I need to add the records by smaller chunks and start from the newest record. I would like to have the shortest amount of time for the total operation (create table, insert rows, create indexes).

Both source and destination have 3 indexes:

1 = Clustered (Column1 ASC, Column2 ASC)

2 = Non-Unique Non-Clustered (Column2 ASC, Column3 ASC)

3 = Non-Unique Non-Clustered (Column4 ASC, Column5 ASC)

I want to insert my columns ordered by Column1 DESC. In 90% cases the Column1 is unique, in about 9% cases there are less than 5 duplicates. Column1+Column2 is always unique. I can add ordering on other columns if it helps optimization, but I have to first start the ordering by Column1 DESC.

So I would like to know what the fastest process should be. Note that my inserts will be done in chunks (probably 1 million). My current plan is the following:

1 : Create table with no index

2 : Insert data by chunks, ORDER BY Column1 DESC. Each chunk included in a BEGIN-COMMIT/ROLLBACK TRANSACTION block. This is inside a stored procedure called by a SQL job on a regular interval.

3 : After every X chunks, shrink the database logs. This is to prevent disk space from blowing up.

4 : Create index #1

5 : Create index #2

6 : Create index #3

7 : Shrink Log

See also : TSql, building indexes before or after data input

Shrinking as a part of a repeated process in SQL Server is generally not a good idea. If you're using FULL recovery mode and take transaction log backups on a frequent interval (i.e., with an Agent job that runs on a schedule), you will prevent the log from getting excessively large without shrinking. — alroc
@alroc good to know, thanks. I forgot to mention that I plan to run the inserts as a stored procedure called by a SQL job. Based on this and what you're saying, I shouldn't shrink logs after X inserts, I should only do it once at the end. — Michel St-Arnaud
No, I'm saying you shouldn't shrink at all. Take t-log backups at a reasonable interval and it shouldn't be an issue. Also, it sounds like you've already decided how you want to do this, and you're just looking for validation. Why not use SSIS? — alroc
also look into bulk load, because it is minimally logged, so drop indexes, bulk load, then re create indexes. — Jeremy
use technet.microsoft.com/en-us/library/ms188728(v=sql.105).aspx and also this simple-talk.com/sql/reporting-services/… as starting points. you can bulk load from bcp, sql script (provided certain conditions are met) and also SSIS. Also the data loading performance guide. technet.microsoft.com/en-us/library/dd425070(v=sql.100).aspx — Jeremy

Michel St-Arnaud Michel St-Arnaud · Accepted Answer · 2016-04-07T13:48:07

Moving the records piece by piece turned out to take incredibly long, with or without indexes, so this option was thrown out. Also when we tried it, the disk space of our transaction logs became huge and made most queries fail.

What we ended up doing was using the SQL bulk transfer tool as suggested, with the new table having no index. This took about an hour for the billion+ rows. Then re-creating the indexes took only a couple minutes each.

How to optimize inserting rows and creating indexes?

1 Answers