1
votes

I am not sure on how to make a decent index that will capture category/log_code properly. Maybe I also need to change my query? Appreciate any input!

All SELECTS contain:

SELECT logentry_id, date, log_codes.log_desc FROM log_entries
INNER JOIN log_codes ON log_entries.log_code = log_codes.log_code 
ORDER BY logentry_id DESC

Query can be as above, but usually has a WHERE to specify the category of log_codes to show, and/or partner, and/or customer. Examples of WHERE:

WHERE partner_id = 1

WHERE log_codes.category_overview = 1

WHERE partner_id = 1 AND log_codes.category_overview = 1

WHERE partner_id = 1 AND customer_id = 1 AND log_codes.category_overview = 1

Database structure:

CREATE TABLE IF NOT EXISTS `log_codes` (
  `log_code` smallint(6) NOT NULL,
  `log_desc` varchar(255),
  `category_mail` tinyint(1) NOT NULL,
  `category_overview` tinyint(1) NOT NULL,
  `category_cron` tinyint(1) NOT NULL,
  `category_documents` tinyint(1) NOT NULL,
  `category_error` tinyint(1) NOT NULL,
  PRIMARY KEY (`log_code`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;

CREATE TABLE IF NOT EXISTS `log_entries` (
  `logentry_id` int(11) NOT NULL AUTO_INCREMENT,
  `date` datetime NOT NULL,
  `log_code` smallint(6) NOT NULL,
  `partner_id` int(11) NOT NULL,
  `customer_id` int(11) NOT NULL,
  PRIMARY KEY (`logentry_id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 ;

EDIT: Added indexes on fields, here is output of SHOW INDEXES:

+-----------+------------+-----------------------+--------------+-----------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table     | Non_unique | Key_name              | Seq_in_index | Column_name           | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-----------+------------+-----------------------+--------------+-----------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| log_codes |          0 | PRIMARY               |            1 | log_code              | A         |          97 |     NULL | NULL   |      | BTREE      |         |               |
| log_codes |          1 | category_mail         |            1 | category_mail         | A         |           1 |     NULL | NULL   |      | BTREE      |         |               |
| log_codes |          1 | category_overview     |            1 | category_overview     | A         |           1 |     NULL | NULL   |      | BTREE      |         |               |
| log_codes |          1 | category_cron         |            1 | category_cron         | A         |           1 |     NULL | NULL   |      | BTREE      |         |               |
| log_codes |          1 | category_documents    |            1 | category_documents    | A         |           1 |     NULL | NULL   |      | BTREE      |         |               |
| log_codes |          1 | category_error        |            1 | category_error        | A         |           1 |     NULL | NULL   |      | BTREE      |         |               |
+-----------+------------+-----------------------+--------------+-----------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+

+-------------+------------+--------------+--------------+--------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table       | Non_unique | Key_name     | Seq_in_index | Column_name  | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-------------+------------+--------------+--------------+--------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| log_entries |          0 | PRIMARY      |            1 | logentry_id  | A         |      163020 |     NULL | NULL   |      | BTREE      |         |               |
| log_entries |          1 | log_code     |            1 | log_code     | A         |          90 |     NULL | NULL   |      | BTREE      |         |               |
| log_entries |          1 | partner_id   |            1 | partner_id   | A         |           6 |     NULL | NULL   | YES  | BTREE      |         |               |
| log_entries |          1 | customer_id  |            1 | customer_id  | A         |       20377 |     NULL | NULL   | YES  | BTREE      |         |               |
+-------------+------------+--------------+--------------+--------------+-----------+-------------+----------+--------+------+------------+---------+---------------+

EDIT 2: Added composite indexes: (log_code, category_overview) and (log_code, category_overview) on log_codes. (customer_id, partner_id) on log_entries.

Here are some EXPLAIN output (query returns 66818 rows):

EXPLAIN SELECT log_entries.logentry_id, log_entries.date, log_codes.log_code_desc FROM log_entries
INNER JOIN log_codes ON log_entries.log_code = log_codes.log_code
WHERE log_entries.partner_id = 1 AND log_codes.category_overview = 1 ORDER BY logentry_id DESC
+----+-------------+-------------+--------+-------------------------------------+------------+---------+----------------------+--------+-----------------------------+
| id | select_type | table       | type   | possible_keys                       | key        | key_len | ref                  | rows   | Extra                       |
+----+-------------+-------------+--------+-------------------------------------+------------+---------+----------------------+--------+-----------------------------+
|  1 | SIMPLE      | log_entries | ref    | log_code,partner_id                 | partner_id | 2       | const                | 156110 | Using where; Using filesort |
|  1 | SIMPLE      | log_codes   | eq_ref | PRIMARY,code_overview,overview_code | PRIMARY    | 2       | log_entries.log_code |      1 | Using where                 |
+----+-------------+-------------+--------+-------------------------------------+------------+---------+----------------------+--------+-----------------------------+

But I also have some LEFT JOINs that I did not think would affect the index design, but they cause a "Using temporary" problem. Here is EXPLAIN output (query returns 66818 rows):

EXPLAIN SELECT log_entries.logentry_id, log_entries.date, log_codes.log_code_desc FROM log_entries 
INNER JOIN log_codes ON log_entries.log_code = log_codes.log_code 
LEFT JOIN partners ON log_entries.partner_id = partners.partner_id 
LEFT JOIN joined_table1 ON log_entries.t1_id = joined_table1.t1_id 
LEFT JOIN joined_table2 ON log_entries.t2_id = joined_table2.t2_id 
LEFT JOIN joined_table3 ON log_entries.t3_id = joined_table3.t3_id 
LEFT JOIN joined_table4 ON joined_table3.t4_id = joined_table4.t4_id 
LEFT JOIN joined_table5 ON log_entries.t5_id = joined_table5.t5_id 
LEFT JOIN joined_table6 ON log_entries.t6_id = joined_table6.t6_id
WHERE log_entries.partner_id = 1 AND log_codes.category_overview = 1 ORDER BY logentry_id DESC;
+----+-------------+---------------+--------+-------------------------------------+---------------+---------+--------------------------+------+----------------------------------------------+
| id | select_type | table         | type   | possible_keys                       | key           | key_len | ref                      | rows | Extra                                        |
+----+-------------+---------------+--------+-------------------------------------+---------------+---------+--------------------------+------+----------------------------------------------+
|  1 | SIMPLE      | log_codes     | ref    | PRIMARY,code_overview,overview_code | overview_code | 1       | const                    |   54 | Using where; Using temporary; Using filesort |
|  1 | SIMPLE      | log_entries   | ref    | log_code,partner_id                 | log_code      | 2       | log_codes.log_code       | 1811 | Using where                                  |
|  1 | SIMPLE      | partners      | const  | PRIMARY                             | PRIMARY       | 2       | const                    |    1 | Using index                                  |
|  1 | SIMPLE      | joined_table1 | eq_ref | PRIMARY                             | PRIMARY       | 1       | log_entries.t1_id        |    1 | Using index                                  |
|  1 | SIMPLE      | joined_table2 | eq_ref | PRIMARY                             | PRIMARY       | 1       | log_entries.t2_id        |    1 | Using index                                  |
|  1 | SIMPLE      | joined_table3 | eq_ref | PRIMARY                             | PRIMARY       | 3       | log_entries.t3_id        |    1 |                                              |
|  1 | SIMPLE      | joined_table4 | eq_ref | PRIMARY                             | PRIMARY       | 3       | joined_table3.t4_id      |    1 | Using index                                  |
|  1 | SIMPLE      | joined_table5 | eq_ref | PRIMARY                             | PRIMARY       | 4       | log_entries.t5_id        |    1 | Using index                                  |
|  1 | SIMPLE      | joined_table6 | eq_ref | PRIMARY                             | PRIMARY       | 4       | log_entries.t6_id        |    1 | Using index                                  |
+----+-------------+---------------+--------+-------------------------------------+---------------+---------+--------------------------+------+----------------------------------------------+

Don't know if it's a good or bad idea, but a subquery seems to get rid of the "Using temporary". Here is EXPLAIN output of two common scenarios. This query returns 66818 rows:

EXPLAIN SELECT log_entries.logentry_id, log_entries.date, log_codes.log_code_desc FROM log_entries INNER JOIN log_codes ON log_entries.log_code = log_codes.log_code
WHERE log_entries.partner_id = 1
AND log_entries.log_code IN (SELECT log_code FROM log_codes WHERE category_overview = 1) ORDER BY logentry_id DESC;
+----+--------------------+-------------+-----------------+-------------------------------------+------------+---------+----------------------+--------+-----------------------------+
| id | select_type        | table       | type            | possible_keys                       | key        | key_len | ref                  | rows   | Extra                       |
+----+--------------------+-------------+-----------------+-------------------------------------+------------+---------+----------------------+--------+-----------------------------+
|  1 | PRIMARY            | log_entries | ref             | log_code,partner_id                 | partner_id | 2       | const                | 156110 | Using where; Using filesort |
|  1 | PRIMARY            | log_codes   | eq_ref          | PRIMARY,code_overview               | PRIMARY    | 2       | log_entries.log_code |      1 |                             |
|  2 | DEPENDENT SUBQUERY | log_codes   | unique_subquery | PRIMARY,code_overview,overview_code | PRIMARY    | 2       | func                 |      1 | Using where                 |
+----+--------------------+-------------+-----------------+-------------------------------------+------------+---------+----------------------+--------+-----------------------------+

And a overview on customer, query returns 12 rows:

EXPLAIN SELECT log_entries.logentry_id, log_entries.date, log_codes.log_code_desc FROM log_entries INNER JOIN log_codes ON log_entries.log_code = log_codes.log_code
WHERE log_entries.partner_id = 1 AND log_entries.customer_id = 10000
AND log_entries.log_code IN (SELECT log_code FROM log_codes WHERE category_overview = 1) ORDER BY logentry_id DESC;
+----+--------------------+-------------+-----------------+--------------------------------------------------+--------------+---------+----------------------+------+-----------------------------+
| id | select_type        | table       | type            | possible_keys                                    | key          | key_len | ref                  | rows | Extra                       |
+----+--------------------+-------------+-----------------+--------------------------------------------------+--------------+---------+----------------------+------+-----------------------------+
|  1 | PRIMARY            | log_entries | ref             | log_code,partner_id,customer_id,customer_partner | customer_id  | 4       | const                |   27 | Using where; Using filesort |
|  1 | PRIMARY            | log_codes   | eq_ref          | PRIMARY,code_overview                            | PRIMARY      | 2       | log_entries.log_code |    1 |                             |
|  2 | DEPENDENT SUBQUERY | log_codes   | unique_subquery | PRIMARY,code_overview,overview_code              | PRIMARY      | 2       | func                 |    1 | Using where                 |
+----+--------------------+-------------+-----------------+--------------------------------------------------+--------------+---------+----------------------+------+-----------------------------+
2
The correct structure of the composite indices will depend on cardinality. If space is not an issue then you will probably be better off with multiple indices to best serve these different queries. Please post the output from SHOW INDEXES after adding a single column index to each of the related fields?nnichols
Thanks! I added single column indexes to the relevant fields (thanks @Shiplu). Output of SHOW INDEXES added to my question. The category_overview index is used when included when testing a WHERE category_overview=1 query. But on SHOW EXPLAIN, I get "Using where; Using temporary; Using filesort" on the log_codes table, seems to be a major bottleneck.elaxsj

2 Answers

1
votes

Well done for taking the time to update your question with the detail requested. I am sorry if that sounds patronising but it is amazing the number people who are not prepared to take the time to help themselves.

Adding a composite index across (customer_id, partner_id) on the log_entries table should give a significant benefit for the last of your example where clauses.

The output of your SHOW INDEXES for the log_codes table would suggest that it is not currently populated as it shows NULL for all but the PK. Is this the case?

EDIT Sorry. Just read your comment to KAJ's answer detailing table content. It might be worth running that SHOW INDEXES statement again as it looks like MySQL may have been building its stats.

Adding a composite index across (log_code, category_overview) for the log_codes table should help but you will need to check the explain output to see if it is being used.

As a very crude general rule you want to create composite indices starting with the columns with the highest cardinality but this is not always the case. It will depend heavily on data distribution and query structure.

UPDATE I have created a mockup of your dataset and added the following indices. They give significant improvement based on your sample WHERE clauses -

ALTER TABLE `log_codes`
    ADD INDEX `IX_overview_code` (`category_overview`, `log_code`);

ALTER TABLE `log_entries`
    ADD INDEX `IX_partner_code` (`partner_id`, `log_code`),
    ADD INDEX `IX_customer_partner_code` (`customer_id`, `partner_id`, `log_code`);

The last index is quite expensive in terms of disk space and degradation of insert performance but gives very fast SELECT based on your final WHERE clause example. My sample dataset has just over 1M records in the log_entries table with quite even distribution across the partner and customer IDs. Three of your sample WHERE clauses execute in less than a second but the one with category_overview as the only criterion is very slow although still sub-second with only 200k rows.

2
votes

There isn't a simple rule for guaranteed success when it comes to indexing - you need to look at a reasonable period of typical calls to work out what will help in terms of performance.

All subsequent comments are therefore to be taken not as absolute rules:

An index is "good" if it quickly gets you to a small subset of the data rather than if it eliminates only half of the data (e.g. there is rarely value in an index on a gender column where there are only M/F as the possible entries). So how unique are the values within e.g. log_code, category_overview and partner_id?

For a given query it is often helpful to have a "covering" index, that is one that includes all the fields that are used by the query - however, if there are too many fields from a single table in a query you instead want an index that includes the fields in the "where" or "join" clause to identify the row and then join back to the table storage to get all the fields required.

So given the information you've provided, a candidate index on log_codes would include log_code and category_overview. Similarly on log_entries for log_code and partner_id. However these would need to be evaluated for how they affect performance.

Bear in mind that any given index may improve the read performance of a single query retrieving data but it will also slow down writes to the table where there is then a requirement to write more information i.e. where the new row fits in the additional index. This is why you need to look at the big picture of activity on the database to determine where indexes are worth it.