MySQL query optimization

Question

Could you please help me optimize this query. I've spent lots of time and still cannot rephrase it to be fast enough (say running in the matters of seconds, not minutes as it is now).

The query:

SELECT m.my_id, m.my_value, m.my_timestamp
  FROM (
    SELECT my_id, MAX(my_timestamp) AS most_recent_timestamp
      FROM my_table
      WHERE my_timestamp < '2011-03-01 08:00:00'
      GROUP BY my_id
  ) as tmp
LEFT OUTER JOIN my_table m
ON tmp.my_id = m.my_id AND tmp.most_recent_timestamp = m.my_timestamp
ORDER BY m.my_timestamp;

my_table is defined as follows:

CREATE TABLE my_table (
   my_id INTEGER NOT NULL,
   my_value VARCHAR(4000),
   my_timestamp TIMESTAMP default CURRENT_TIMESTAMP NOT NULL,
   INDEX MY_ID_IDX (my_id),
   INDEX MY_TIMESTAMP_IDX (my_timestamp),
   INDEX MY_ID_MY_TIMESTAMP_IDX (my_id, my_timestamp)
);

The goal of this query is to select the most recent my_value for each my_idbefore some timestamp. my_table contains ~100 million entries and it takes ~8 minutes to perform it.

explain:

+----+-------------+-------------+-------+------------------------------------------------+-------------------------+---------+---------------------------+-------+---------------------------------------+
| id | select_type | table       | type  | possible_keys                                  | key                     | key_len | ref                       | rows  | Extra                                 |
+----+-------------+-------------+-------+------------------------------------------------+-------------------------+---------+---------------------------+-------+---------------------------------------+
|  1 | PRIMARY     | <derived2>  | ALL   | NULL                                           | NULL                    | NULL    | NULL                      | 90721 | Using temporary; Using filesort       |
|  1 | PRIMARY     | m          | ref   | MY_ID_IDX,MY_TIMESTAMP_IDX,MY_ID_TIMESTAMP_IDX | MY_TIMESTAMP_IDX        | 4       | tmp.most_recent_timestamp |    1  | Using where                           |
|  2 | DERIVED     | my_table    | range | MY_TIMESTAMP_IDX                               | MY_ID_MY_TIMESTAMP_IDX  | 8       | NULL                      | 61337 | Using where; Using index for group-by |
+----+-------------+-------------+-------+------------------------------------------------+-----------------------+---------+---------------------------+------+---------------------------------------+

Are you sure that's the query plan for the query you posted? The plan mentions table nv, but there's no such table in the query. The query may not even be correct, as the value for my_id in the sub-select may not be (indeed isn't likely to be) the id for the row where my_timestamp = MAX(my_timestamp). — outis
Shouldn't your join condition be ...AND tmp.most_recent_timestamp = m.my_timestamp...? The inner query also looks to be missing a GROUP BY. — Joe Stefanelli
Can you explain what "SELECT my_id, MAX(my_timestamp) AS .." does and why no group by? — Zimbabao
@outis, I'm sorry. I've modified original explain from the production DB and there might be inconsitencies. I've tried to correct them.| @Joe Stefanelli, yes it does. Seems that I've missed it while preparing and SCCE. @Zimbabao, mysql version 5.1. — Alex Nikolaenkov

sreimer sreimer · Accepted Answer · 2011-03-02T14:32:48

If I understand correctly, you should be able to drop the nested select completely, and move the where clause to the main query, order by my_timestamp descending and limit 1.

SELECT my_id, my_value, max(my_timestamp)
FROM my_table
WHERE my_timestamp < '2011-03-01 08:00:00'
GROUP BY my_id

*edit - added max and group by

MySQL query optimization

3 Answers