Indexes in Mysql Query behaving indifferently

Question

I have a query:

select SQL_NO_CACHE id from users
 where id>1 and id <1000
   and id in  ( select owner_id from comments and content_type='Some_string');

(note that it is short of an actual large query used for my sphinx indexing, representing the problem) This query is taking about 3.5 seconds(modifying range from id = 1..5000 makes it about 15 secs).

users table has about 35000 entries and comments table has about 8000 entries.

Explain on above query:

explain select SQL_NO_CACHE id from users
        where id>1 and id <1000
          and id in  ( select distinct owner_id from d360_core_comments);

| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
| 1 | PRIMARY | users | range | PRIMARY | PRIMARY | 4 | NULL | 1992 | Using where; Using index |
| 2 | DEPENDENT SUBQUERY | d360_core_comments | ALL | NULL | NULL | NULL | NULL | 6901 | Using where; Using temporary |

where the individual subquery(select owner_id from d360_core_comments where content_type='Community20::Topic';) here is taking almost 0.0 seconds.

However if I add index on owner_id,content_type, (note the order here)

create index tmp_user on d360_core_comments (owner_id,content_type);

My subquery runs as is in ~0.0 seconds with NO index used:

mysql> explain select owner_id from d360_core_comments where content_type='Community20::Topic';

| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
| 1 | SIMPLE | d360_core_comments | ALL | NULL | NULL | NULL | NULL | 6901 | Using where |

However now my main query (select SQL_NO_CACHE id from users where id>1 and id <1000 and id in ( select owner_id from d360_core_comments where content_type='Community20::Topic');) now runs in ~0 seconds with following explain:

mysql> explain select SQL_NO_CACHE id from users where id>1 and id <1000 and id in ( select owner_id from d360_core_comments where content_type='Community20::Topic');

| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
| 1 | PRIMARY | users | range | PRIMARY | PRIMARY | 4 | NULL | 1992 | Using where; Using index |
| 2 | DEPENDENT SUBQUERY | d360_core_comments | index_subquery | tmp_user | tmp_user | 5 | func | 34 | Using where |

So the main questions I have are:

If the index defined on the table used in my subquery is not getting used in my actual subquery then how it is optimizing the query here?
And why in the first place the first query was taking so much time when the actual subquery and main query independently are much faster?

Stack Overflow uses four spaces to identify blocks of code. Quoting things like you've done here makes it much harder to read. — tadman

rsanchez rsanchez · Accepted Answer · 2013-02-01T18:15:00

If you read this section of the MySQL Reference Manual: Optimizing Subqueries with EXISTS Strategy, you'll see that the query optimizer transforms your subquery condition from:

id in ( select distinct owner_id
          from d360_core_comments
         where content_type='Community20::Topic')

into:

exists ( select 1
           from d360_core_comments
          where content_type='Community20::Topic'
            and owner_id = users.id )

This is why a index on (owner_id, content_type) is not useful when the subquery is tested as standalone query, but it is useful when considering the transformed subquery.

Indexes in Mysql Query behaving indifferently

4 Answers