2
votes

I have a Job table with an primary key and a couple of date related fields ...

 +------------------+---------------+------+-----+-------------------+----------------+
| Field            | Type          | Null | Key | Default           | Extra          |
+------------------+---------------+------+-----+-------------------+----------------+
| jobId            | bigint(20)    | NO   | PRI | NULL              | auto_increment |
| creationDateTime | datetime      | NO   | MUL | NULL              |                |
| lastModified     | timestamp     | NO   | MUL | CURRENT_TIMESTAMP |                |
+------------------+---------------+------+-----+-------------------+----------------+

This table has 426,579 rows and the following indexes

+---------+------------+--------------------------+--------------+------------------+-----------+-------------+----------+--------+------+------------+---------+
| Table   | Non_unique | Key_name                 | Seq_in_index | Column_name      | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+---------+------------+--------------------------+--------------+------------------+-----------+-------------+----------+--------+------+------------+---------+
| Job     |          0 | PRIMARY                  |            1 | jobId            | A         |      439957 |     NULL | NULL   |      | BTREE      |         |
| Job     |          1 | Job_lastModified_idx     |            1 | lastModified     | A         |      439957 |     NULL | NULL   |      | BTREE      |         |
| Job     |          1 | Job_creationDateTime_idx |            1 | creationDateTime | A         |      439957 |     NULL | NULL   |      | BTREE      |         |
+---------+------------+--------------------------+--------------+------------------+-----------+-------------+----------+--------+------+------------+---------+

Now a query such as ...

select * from Job
 where jobId > 1000
 and creationDateTime between '2011-09-07 18:29:24' and '2011-09-07 20:00:33';

runs very fast (0s) as expected

with an explain of...

+----+-------------+-------+-------+----------------------------------+--------------------------+---------+------+------+-------------+
| id | select_type | table | type  | possible_keys                    | key                      | key_len | ref  | rows | Extra       |
+----+-------------+-------+-------+----------------------------------+--------------------------+---------+------+------+-------------+
|  1 | SIMPLE      | Job   | range | PRIMARY,Job_creationDateTime_idx | Job_creationDateTime_idx | 8       | NULL |   39 | Using where |
+----+-------------+-------+-------+----------------------------------+--------------------------+---------+------+------+-------------+

I see the same speed against an equivalent query targetting lastModified instead of creationDateTime


However the slightly more complex query ...

select * from Job
 where jobId > 1000
 and (lastModified between '2011-09-07 18:29:24' and '2011-09-07 20:00:33' 
      or creationDateTime between '2011-09-07 18:29:24' and '2011-09-07 20:00:33');

runs slowly (9sec) and for some reason ends up having to do a larger scan, even though as expected it returns the same number of rows (40)

+----+-------------+-------+-------+-------------------------------------------------------+---------+---------+------+--------+-------------+
| id | select_type | table | type  | possible_keys                                         | key     | key_len | ref  | rows   | Extra       |
+----+-------------+-------+-------+-------------------------------------------------------+---------+---------+------+--------+-------------+
|  1 | SIMPLE      | Job   | range | PRIMARY,Job_lastModified_idx,Job_creationDateTime_idx | PRIMARY | 8       | NULL | 204581 | Using where |
+----+-------------+-------+-------+-------------------------------------------------------+---------+---------+------+--------+-------------+

( Note: this table has 426579 rows, so unsure where the 204581 comes from )


I would have thought this would be equally as fast?

Why is MySQL unable to make use of these indexes when composed in this way?

Creating some additional composites index of lastModified & creationDateTime didn't help at all.

 create index test_idx1 on Job (lastModified,creationDateTime);
 create index test_idx2 on Job (jobId,lastModified,creationDateTime);

I must be missing something simple?

1
do a 'show table status' and see how big the key files are. The keys may fit into buffers when used individually, but exceed the buffer when combined, so the keys have to be reloaded from scratch for each query. And in some cases, mysql will decide that it's easier to just scan the tables anyways, regardless of key availability.Marc B
It might also be worth running OPTIMIZE TABLE over this table, if you haven't already done so.cdhowie
Could be mysql decides that it would be simpler to do partial table scans once it ses the and/or construct.Mouse Food
just out of curiosity, how does a distinct union of two selects (one using lastModified and the other creationDateTime) perform?hatchet - done with SOverflow
See stackoverflow.com/questions/7110964/…, it's almost a duplicate question.Bill Karwin

1 Answers

1
votes

From the reference from @bill it looks like our version of mysql (5.0.67) doesnt support index merging so no index would help in this case.

I'll go with a union which is fast, thanks folks!