5
votes

I have a table for log entries, and a description table for the about 100 possible log codes:

CREATE TABLE `log_entries` (
  `logentry_id` int(11) NOT NULL AUTO_INCREMENT,
  `date` datetime NOT NULL,
  `partner_id` smallint(4) NOT NULL,
  `log_code` smallint(4) NOT NULL,
  PRIMARY KEY (`logentry_id`),
  KEY `IX_code` (`log_code`),
  KEY `IX_partner_code` (`partner_id`,`log_code`)
) ENGINE=MyISAM ;

CREATE TABLE IF NOT EXISTS `log_codes` (
  `log_code` smallint(4) NOT NULL DEFAULT '0',
  `log_desc` varchar(255) DEFAULT NULL,
  `category_overview` tinyint(1) NOT NULL DEFAULT '0',
  `category_error` tinyint(1) NOT NULL DEFAULT '0',
  PRIMARY KEY (`log_code`),
  KEY `IX_overview_code` (`category_overview`,`log_code`),
  KEY `IX_error_code` (`category_error`,`log_code`)
) ENGINE=MyISAM ;

The follwing query (matching 10k of 20k rows) executes in 0.0034 sec (using LIMIT 0,20):

SELECT log_entries.date, log_codes.log_desc FROM log_entries 
INNER JOIN log_codes ON log_codes.log_code = log_entries.log_code 
WHERE log_entries.partner_id = 1 AND log_codes.category_overview = 1;

But when adding ORDER BY log_entries.logentry_id DESC, which is of course necessary, it slows down to 0.6 sec. Probably because "Using temporary" is used on the log_codes table? Removing the indexes actually makes the query perform faster, but still slow (0.3 sec).

EXPLAIN output of the query without ORDER BY:

+----+-------------+-------------+------+----------------------------+------------------+---------+--------------------------+------+-------------+
| id | select_type | table       | type | possible_keys              | key              | key_len | ref                      | rows | Extra       |
+----+-------------+-------------+------+----------------------------+------------------+---------+--------------------------+------+-------------+
|  1 | SIMPLE      | log_codes   | ref  | PRIMARY,IX_overview_code   | IX_overview_code | 1       | const                    |   56 |             |
|  1 | SIMPLE      | log_entries | ref  | IX_code,IX_partner_code    | IX_partner_code  | 7       | const,log_codes.log_code |   25 | Using where |
+----+-------------+-------------+------+----------------------------+------------------+---------+--------------------------+------+-------------+

And including the ORDER BY:

+----+-------------+-------------+------+----------------------------+------------------+---------+--------------------------+------+---------------------------------+
| id | select_type | table       | type | possible_keys              | key              | key_len | ref                      | rows | Extra                           |
+----+-------------+-------------+------+----------------------------+------------------+---------+--------------------------+------+---------------------------------+
|  1 | SIMPLE      | log_codes   | ref  | PRIMARY,IX_overview_code   | IX_overview_code | 1       | const                    |   56 | Using temporary; Using filesort |
|  1 | SIMPLE      | log_entries | ref  | IX_code,IX_partner_code    | IX_partner_code  | 7       | const,log_codes.log_code |   25 | Using where                     |
+----+-------------+-------------+------+----------------------------+------------------+---------+--------------------------+------+---------------------------------+

Any hints on how to get this query to perform faster? I can't see why "using temporary" should be needed, as the log codes should be chosen before fetching and sorting the appropiate log entries?

UPDATE @Eugen Rieck:

SELECT log_entries.date, lc.log_desc FROM log_entries INNER JOIN (SELECT log_desc, log_code FROM log_codes WHERE category_overview = 1) AS lc ON lc.log_code = log_entries.log_code WHERE log_entries.partner_id = 1 ORDER BY log_entries.logentry_id;
+----+-------------+-------------+------+-------------------------+------------------+---------+-------------------+------+---------------------------------+
| id | select_type | table       | type | possible_keys           | key              | key_len | ref               | rows | Extra                           |
+----+-------------+-------------+------+-------------------------+------------------+---------+-------------------+------+---------------------------------+
|  1 | PRIMARY     | <derived2>  | ALL  | NULL                    | NULL             | NULL    | NULL              |   57 | Using temporary; Using filesort |
|  1 | PRIMARY     | log_entries | ref  | IX_code,IX_partner_code | IX_partner_code  | 7       | const,lc.log_code |   25 | Using where                     |
|  2 | DERIVED     | log_codes   | ref  | IX_overview_code        | IX_overview_code | 1       |                   |   56 |                                 |
+----+-------------+-------------+------+-------------------------+------------------+---------+-------------------+------+---------------------------------+

UPDATE @RolandoMySQLDBA:

With my original indexes, ORDER BY date DESC:

SELECT log_entries.date, log_codes.log_desc FROM (SELECT log_code,date FROM log_entries WHERE partner_id = 1) log_entries INNER JOIN (SELECT log_code,log_desc FROM log_codes WHERE category_overview = 1) log_codes USING (log_code) ORDER BY log_entries.date DESC;
+----+-------------+-------------+------+------------------+------------------+---------+------+-------+---------------------------------+
| id | select_type | table       | type | possible_keys    | key              | key_len | ref  | rows  | Extra                           |
+----+-------------+-------------+------+------------------+------------------+---------+------+-------+---------------------------------+
|  1 | PRIMARY     | <derived3>  | ALL  | NULL             | NULL             | NULL    | NULL |    57 | Using temporary; Using filesort |
|  1 | PRIMARY     | <derived2>  | ALL  | NULL             | NULL             | NULL    | NULL | 21937 | Using where; Using join buffer  |
|  3 | DERIVED     | log_codes   | ref  | IX_overview_code | IX_overview_code | 1       |      |    56 |                                 |
|  2 | DERIVED     | log_entries | ALL  | IX_partner_code  | NULL             | NULL    | NULL | 22787 | Using where                     |
+----+-------------+-------------+------+------------------+------------------+---------+------+-------+---------------------------------+

With your indexes, no ordering:

SELECT log_entries.date, log_codes.log_desc FROM (SELECT log_code,date FROM log_entries WHERE partner_id = 1) log_entries INNER JOIN (SELECT log_code,log_desc FROM log_codes WHERE category_overview = 1) log_codes USING (log_code);
+----+-------------+-------------+-------+-----------------------+-----------------------+---------+------+-------+--------------------------------+
| id | select_type | table       | type  | possible_keys         | key                   | key_len | ref  | rows  | Extra                          |
+----+-------------+-------------+-------+-----------------------+-----------------------+---------+------+-------+--------------------------------+
|  1 | PRIMARY     | <derived3>  | ALL   | NULL                  | NULL                  | NULL    | NULL |    57 |                                |
|  1 | PRIMARY     | <derived2>  | ALL   | NULL                  | NULL                  | NULL    | NULL | 21937 | Using where; Using join buffer |
|  3 | DERIVED     | log_codes   | index | IX_overview_code_desc | IX_overview_code_desc | 771     | NULL |    80 | Using where; Using index       |
|  2 | DERIVED     | log_entries | index | IX_partner_code_date  | IX_partner_code_date  | 15      | NULL | 22787 | Using where; Using index       |
+----+-------------+-------------+-------+-----------------------+-----------------------+---------+------+-------+--------------------------------+

With your indexes, ORDER BY date DESC:

SELECT log_entries.date, log_codes.log_desc FROM (SELECT log_code,date FROM log_entries WHERE partner_id = 1) log_entries INNER JOIN (SELECT log_code,log_desc FROM log_codes WHERE category_overview = 1) log_codes USING (log_code) ORDER BY log_entries.date DESC;
+----+-------------+-------------+-------+-----------------------+-----------------------+---------+------+-------+---------------------------------+
| id | select_type | table       | type  | possible_keys         | key                   | key_len | ref  | rows  | Extra                           |
+----+-------------+-------------+-------+-----------------------+-----------------------+---------+------+-------+---------------------------------+
|  1 | PRIMARY     | <derived3>  | ALL   | NULL                  | NULL                  | NULL    | NULL |    57 | Using temporary; Using filesort |
|  1 | PRIMARY     | <derived2>  | ALL   | NULL                  | NULL                  | NULL    | NULL | 21937 | Using where; Using join buffer  |
|  3 | DERIVED     | log_codes   | index | IX_overview_code_desc | IX_overview_code_desc | 771     | NULL |    80 | Using where; Using index        |
|  2 | DERIVED     | log_entries | index | IX_partner_code_date  | IX_partner_code_date  | 15      | NULL | 22787 | Using where; Using index        |
+----+-------------+-------------+-------+-----------------------+-----------------------+---------+------+-------+---------------------------------+

UPDATE @Joe Stefanelli:

SELECT log_entries.date, log_codes.log_desc FROM log_entries INNER JOIN log_codes ON log_codes.log_code = log_entries.log_code WHERE log_entries.partner_id = 1 AND log_codes.category_overview = 1 ORDER BY date DESC;
+----+-------------+-------------+------+--------------------------+-----------------+---------+--------------------------+------+----------------------------------------------+
| id | select_type | table       | type | possible_keys            | key             | key_len | ref                      | rows | Extra                                        |
+----+-------------+-------------+------+--------------------------+-----------------+---------+--------------------------+------+----------------------------------------------+
|  1 | SIMPLE      | log_codes   | ALL  | PRIMARY,IX_code_overview | NULL            | NULL    | NULL                     |   80 | Using where; Using temporary; Using filesort |
|  1 | SIMPLE      | log_entries | ref  | IX_code,IX_code_partner  | IX_code_partner | 7       | log_codes.log_code,const |   25 | Using where                                  |
+----+-------------+-------------+------+--------------------------+-----------------+---------+--------------------------+------+----------------------------------------------+
3
Please try SELECT log_entries.date, lc.log_desc FROM log_entries INNER JOIN (SELECT log_desc, log_code FROM log_codes WHERE category_overview = 1) AS lc ON lc.log_code = log_entries.log_code WHERE log_entries.partner_id = 1 ORDER BY log_entries.logentry_idand post backEugen Rieck
@Eugen Rieck Thanks. I've added EXPLAIN output of your query to the question. Query performance is about the same (0.6 sec on LIMIT 0,25).elaxsj

3 Answers

2
votes

I think, most of problems here and in similar questions come from misunderstanding how MySQL (and other databases) uses indexes for sorting. The answer is: MySQL does not use indexes for sorting, it just can read data in the order of an index or in the opposite direction. If you happened to want the data to be oredered in the order of the currently used index - you are lucky, otherwise the result will be sorted (hence filesort in EXPLAIN)

That is order of the whole result mostly depends on which table was the first in the join. And if you look at your EXPLAIN you will see that the join starts from 'log_codes' table (because it is much smaller).

Basically, what you need is a composite index (partner_id, date) on 'log_entries', a covering composite index (log_code, category_overview, log_desc) for 'log_codes', change 'INNER JOIN' to 'STRAIGHT_JOIN' to force the join order, and order by 'date' DESC (this index will fortunately be covering too).

UPD1: I am sorry, I mistyped the index for the first table: it should be (partner_id, log_code, date).

But I still struggle to understand why MySQL choose to "use temporary" on the log_codes table (and 100x query time) when I try to sort on a column in another table?

MySQL can either directly output data as long as you agree with the ordering in which it gets it, or put data in a temporary table, apply sorting and output then. When you order by a field from any non-first table in joins, MySQL has to sort data (not just output in the order of an index) and to sort data it needs a temporary table.

But as I get further into the dataset it is slower (6 sec for LIMIT 50000,25). Do you know why?

To output rows 50000,25 MySQL anyway needs to fetch the first 50000 and skip them. Since I missed a column in the index, MySQL not just skanned the index but for each item made an additional on disc lookup for log_code value. With the covering index that should be much faster, since all data can be fetched from the index.

UPD2: try to force the index:

SELECT log_entries.date, log_codes.log_desc
FROM log_entries FORCE INDEX (IX_partner_code_date)
STRAIGHT_JOIN log_codes
  ON log_codes.log_code = log_entries.log_code
WHERE log_entries.partner_id = 1
  AND log_codes.category_overview = 1
ORDER BY log_entries.date DESC;
1
votes

You are going to need two things

REFACTOR THE QUERY

SELECT log_entries.date, log_codes.log_desc FROM 
(SELECT log_code,date FROM log_entries WHERE partner_id = 1) log_entries
INNER JOIN
(SELECT log_code,log_desc FROM log_codes WHERE category_overview = 1) log_codes
USING (log_code); 

CREATE INDEXES TO SUPPORT SUBQUERIES AND REDUCE TABLE ACCESS

Before creating these indexes, run these

SELECT COUNT(1) rowcount,partner_id FROM log_entries GROUP BY partner_id;
SELECT COUNT(1) rowcount,category_overview FROM log_codes GROUP BY category_overview;

If none of the counts from all possible partner_id values exceed 5% of the log_entries table, create this index

ALTER TABLE log_entries ADD INDEX (partner_id,log_code,date);

If none of the counts from all possible category_overview values exceed 5% of the log_codes table, create this index

ALTER TABLE log_codes ADD INDEX (category_overview,log_code,log_desc);

Give it a Try !!!

Please try this refactored query with LIMIT 0,25 included

SELECT log_entries.date, log_codes.log_desc FROM 
(
    SELECT A.log_code FROM 
    (SELECT log_code FROM log_entries WHERE partner_id = 1) A INNER JOIN
    (SELECT log_code FROM log_codes WHERE category_overview = 1) B USING (log_code)
    LIMIT 0,25
) log_code_keys
INNER JOIN log_entries USING (log_code)
INNER JOIN log_code USING (log_code);
0
votes

I'd start by reversing the columns in the IX_partner_code and IX_overview_code indexes. That should make them better suited to support both the JOIN and the WHERE clause.

...
KEY `IX_code_partner` (`log_code`,`partner_id`)
...
KEY `IX_code_overview` (`log_code`,`category_overview`),
...