I have a table with over a billion rows. I want to basically have a copy of this table, however I need to add the records by smaller chunks and start from the newest record. I would like to have the shortest amount of time for the total operation (create table, insert rows, create indexes).
Both source and destination have 3 indexes:
1 = Clustered (Column1 ASC, Column2 ASC)
2 = Non-Unique Non-Clustered (Column2 ASC, Column3 ASC)
3 = Non-Unique Non-Clustered (Column4 ASC, Column5 ASC)
I want to insert my columns ordered by Column1 DESC. In 90% cases the Column1 is unique, in about 9% cases there are less than 5 duplicates. Column1+Column2 is always unique. I can add ordering on other columns if it helps optimization, but I have to first start the ordering by Column1 DESC.
So I would like to know what the fastest process should be. Note that my inserts will be done in chunks (probably 1 million). My current plan is the following:
1 : Create table with no index
2 : Insert data by chunks, ORDER BY Column1 DESC. Each chunk included in a BEGIN-COMMIT/ROLLBACK TRANSACTION block. This is inside a stored procedure called by a SQL job on a regular interval.
3 : After every X chunks, shrink the database logs. This is to prevent disk space from blowing up.
4 : Create index #1
5 : Create index #2
6 : Create index #3
7 : Shrink Log
See also : TSql, building indexes before or after data input
FULL
recovery mode and take transaction log backups on a frequent interval (i.e., with an Agent job that runs on a schedule), you will prevent the log from getting excessively large without shrinking. – alroc