Need to delete duplicate records from the table. Table contains 33 columns out of them only PK_NUM
is the primary key columns. As PK_NUM
contains unique records we need to consider either min/max value.
- Total records in the table : 1766799022
- Distinct records in the table : 69237983
- Duplicate records in the table : 1697561039
Column details :
- 4 : Date data type
- 4 : Number data type
- 1 : Char data type
- 24 : Varchar2 data type
Size of table : 386 GB
DB details : Oracle Database 11g EE::11.2.0.2.0 ::64bit Production
Sample data :
- col1 ,col2,col3
- 1,ABC,123
- 2,PQR,456
- 3,ABC,123
Expected data should contains only 2 records:
- col1,col2,col3
- 1,ABC,123
- 2,PQR,456
*1 can be replaced by 3 ,vice versa.
My plan here is to
- Pull distinct records and store it in a back up table.(ie by using insert into select)
- Truncate existing table and move records from back up to existing.
As data size is huge ,
- Want to know what is the optimized sql for retrieving the distinct records
- Any estimate on how much it will take to complete (insert into select) and to truncate the existing table.
Please do let me know, if there is any other best way to achieve this. My ultimate goal is to remove the duplicates.