2
votes

I have a mysql table (articles) with a nested index (blog_id, published), and performs poorly. I see a lot of these in my slow query logs:

- Query_time: 23.184007 Lock_time: 0.000063 Rows_sent: 380 Rows_examined: 6341 SELECT id from articles WHERE category_id = 11 AND blog_id IN (13,14,15,16,17,18,19,20,21,22,23,24,26,27,6330,6331,8269,12218,18889) order by published DESC LIMIT 380;

I have trouble understanding why mysql would run through all rows with those blog_ids to figure out my top 380 rows. I would expect the whole purpose of the nested index is to speed that up. To the very least, even a naive implementation, should look-up by blog_id and get it's top 380 rows ordered by published. That should be fast, since, we can figure out the exact 200 rows, due to the nested index. And then sort the resulting 19*200=3800 rows.

If one were to implement it in the most optimal way, you would put a heap from the set of all blog-id based streams and pick the one with the max(published) and repeat it 200 times. Each operation should be fast.

I'm surely missing something since Google, Facebook, Twitter, Microsoft and all the big companies are using mysql for production purposes. Any one with experience?

Edit: Updating as per, thieger's answer. I tried index hinting, and it doesn't seem to help. Results are attached below, at the end. Mysql order by optimisation claims to address the concern theiger is raising:

I agree that MySQL might possibly use the composite blog_id-published-index, but only for the blog_id part of the query.

SELECT * FROM t1 WHERE key_part1=constant ORDER BY key_part2;

Atleast mysql seems to claim it can be used beyond just the WHERE clause (blog_id part of the query). Any help theiger?

Thanks, -Prasanna [myprasanna at gmail dot com]

CREATE TABLE IF NOT EXISTS `articles` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `category_id` int(11) DEFAULT NULL,
  `blog_id` int(11) DEFAULT NULL,
  `cluster_id` int(11) DEFAULT NULL,
  `title` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
  `description` text COLLATE utf8_unicode_ci,
  `keywords` text COLLATE utf8_unicode_ci,
  `image_url` varchar(511) COLLATE utf8_unicode_ci DEFAULT NULL,
  `url` varchar(511) COLLATE utf8_unicode_ci DEFAULT NULL,
  `url_hash` varchar(50) COLLATE utf8_unicode_ci DEFAULT NULL,
  `author` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
  `categories` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
  `published` int(11) DEFAULT NULL,
  `created_at` datetime DEFAULT NULL,
  `updated_at` datetime DEFAULT NULL,
  `is_image_crawled` tinyint(1) DEFAULT NULL,
  `image_candidates` text COLLATE utf8_unicode_ci,
  `title_hash` varchar(50) COLLATE utf8_unicode_ci DEFAULT NULL,
  `article_readability_crawled` tinyint(1) DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `index_articles_on_url_hash` (`url_hash`),
  KEY `index_articles_on_cluster_id` (`cluster_id`),
  KEY `index_articles_on_published` (`published`),
  KEY `index_articles_on_is_image_crawled` (`is_image_crawled`),
  KEY `index_articles_on_category_id` (`category_id`),
  KEY `index_articles_on_title_hash` (`title_hash`),
  KEY `index_articles_on_article_readability_crawled` (`article_readability_crawled`),
  KEY `index_articles_on_blog_id` (`blog_id`,`published`)
) ENGINE=InnoDB  DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci AUTO_INCREMENT=562907 ;

SELECT id from articles USE INDEX(index_articles_on_blog_id) WHERE category_id = 11 AND blog_id IN (13,14,15,16,17,18,19,20,21,22,23,24,26,27,6330,6331,8269,12218,18889) order by published DESC LIMIT 380;

....
380 rows in set (11.27 sec)

explain SELECT id from articles USE INDEX(index_articles_on_blog_id) WHERE category_id = 11 AND blog_id IN (13,14,15,16,17,18,19,20,21,22,23,24,26,27,6330,6331,8269,12218,18889) order by published DESC LIMIT 380\G;
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: articles
         type: range
possible_keys: index_articles_on_blog_id
          key: index_articles_on_blog_id
      key_len: 5
          ref: NULL
         rows: 8640
        Extra: Using where; Using filesort
1 row in set (0.00 sec)
3

3 Answers

3
votes

Did you try EXPLAIN to see whether your index is used at all? Did you ANALYZE to update the index statistics?

I agree that MySQL might possibly use the composite blog_id-published-index, but only for the blog_id part of the query. If the index is not used after ANALYZE, you can try giving MySQL a hint with USE INDEX or even FORCE INDEX, but the MySQL optimizer may also correctly assume that a sequential scan is faster than using the index. For your kind of query, I would also propose to add an index on category_id and blog_id and try to use that.

1
votes

Aside from thieger's excellent answer, you might also want to check:

  • if an index on (category_id,blog_id,published) is any use.
  • if there is enough room to keep all indexes in memory (innodb buffer pool usage & flushes for instance, mysqlreport is a very handy tool in that respect)
0
votes

MySQL has a cutoff mechanism where if it detects that it will probably have to look at more than about a third of the table anyway, it won't use the index. Since it appears your query will match just over 6000 rows of an 8000-odd row table, that is definitely what is happening.

In addition, MySQL can't usually use an index twice on the same table, nor can it use more than one. In this case, it won't use the index for the ORDER BY clause because it has different columns specified than in the WHERE clause.