5
votes


Hi!
I need to scan a very large table on mysql (5.1),
this is how the table looks like more or less:

 CREATE TABLE `big_table` (
   `id` BIGINT(11) NOT NULL AUTO_INCREMENT,
   `main_id` INT(11) DEFAULT NULL,
   `key` VARCHAR(20) NOT NULL,
   PRIMARY KEY (`id`),
   KEY `main_id_key` (`main_id`,`key`),
 ) ENGINE=INNODB AUTO_INCREMENT=2315678197 DEFAULT CHARSET=utf8

I need to get all unique values of main_id + keys into a new table.
Using the following query takes a lot of time (still running after 3 days on a very fast server):

 CREATE TABLE `get_unique` (
   `main_id` int(11) NOT NULL,
   `key` varchar(20) NOT NULL,
   PRIMARY KEY (`main_id`,`key`)
 ) ENGINE=InnoDB DEFAULT CHARSET=utf8;

 INSERT IGNORE INTO get_unique 
 SELECT main_id,key FROM big_table

So my question is -
Will this be faster?

 CREATE TABLE `get_unique` (
   `main_id` int(11) NOT NULL,
   `key` varchar(20) NOT NULL,
   PRIMARY KEY (`main_id`,`key`)
 ) ENGINE=InnoDB DEFAULT CHARSET=utf8;

 INSERT INTO get_unique 
 SELECT main_id,key FROM big_table
 GROUP BY 1,2
1
Results: INSERT IGNORE was running for over 47 hours before it was killed. INSERT as SELECT ... GROUP BY finished running after 15 hours.Mistdemon
If your goal was to get rid of the AUTO_INCREMENT column `id` in favor of the `main_id_key` index, and removing the duplicate rows then you could just have used ALTER IGNORE TABLE `big_table` ADD UNIQUE (`main_id`, `key`). No need for a temporary table like `get_unique`. See dev.mysql.com/doc/refman/5.1/en/alter-table.html for more.Kim

1 Answers

4
votes

Yes GROUP BY main_id, key will perform many times faster as compared to INSERT IGNORE.

SELECT.. GROUP BY main_id, key would get executed faster by making use of covering index and result in a fewer number of records, whereas INSERT IGNORE will involve INDEX KEY look-ups for every row that is being inserted.