5
votes


Hi!
I need to scan a very large table on mysql (5.1),
this is how the table looks like more or less:

 CREATE TABLE `big_table` (
   `id` BIGINT(11) NOT NULL AUTO_INCREMENT,
   `main_id` INT(11) DEFAULT NULL,
   `key` VARCHAR(20) NOT NULL,
   PRIMARY KEY (`id`),
   KEY `main_id_key` (`main_id`,`key`),
 ) ENGINE=INNODB AUTO_INCREMENT=2315678197 DEFAULT CHARSET=utf8

I need to get all unique values of main_id + keys into a new table.
Using the following query takes a lot of time (still running after 3 days on a very fast server):

 CREATE TABLE `get_unique` (
   `main_id` int(11) NOT NULL,
   `key` varchar(20) NOT NULL,
   PRIMARY KEY (`main_id`,`key`)
 ) ENGINE=InnoDB DEFAULT CHARSET=utf8;

 INSERT IGNORE INTO get_unique 
 SELECT main_id,key FROM big_table

So my question is -
Will this be faster?

 CREATE TABLE `get_unique` (
   `main_id` int(11) NOT NULL,
   `key` varchar(20) NOT NULL,
   PRIMARY KEY (`main_id`,`key`)
 ) ENGINE=InnoDB DEFAULT CHARSET=utf8;

 INSERT INTO get_unique 
 SELECT main_id,key FROM big_table
 GROUP BY 1,2
1
Results: INSERT IGNORE was running for over 47 hours before it was killed. INSERT as SELECT ... GROUP BY finished running after 15 hours. - Mistdemon
If your goal was to get rid of the AUTO_INCREMENT column `id` in favor of the `main_id_key` index, and removing the duplicate rows then you could just have used ALTER IGNORE TABLE `big_table` ADD UNIQUE (`main_id`, `key`). No need for a temporary table like `get_unique`. See dev.mysql.com/doc/refman/5.1/en/alter-table.html for more. - Kim

1 Answers

4
votes

Yes GROUP BY main_id, key will perform many times faster as compared to INSERT IGNORE.

SELECT.. GROUP BY main_id, key would get executed faster by making use of covering index and result in a fewer number of records, whereas INSERT IGNORE will involve INDEX KEY look-ups for every row that is being inserted.