0
votes

Here is the query:

select timespans.id as timespan_id, count(*) as num
 from reports, timespans
 where  timespans.after_date >= '2011-04-13 22:08:38' and
        timespans.after_date <= reports.authored_at and
        reports.authored_at < timespans.before_date
 group by timespans.id;

Here are the table defs:

CREATE TABLE `reports` (
  `id` int(11) NOT NULL auto_increment,
  `source_id` int(11) default NULL,
  `url` varchar(255) default NULL,
  `lat` decimal(20,15) default NULL,
  `lng` decimal(20,15) default NULL,
  `content` text,
  `notes` text,
  `authored_at` datetime default NULL,
  `created_at` datetime default NULL,
  `updated_at` datetime default NULL,
  `data` text,
  `title` varchar(255) default NULL,
  `author_id` int(11) default NULL,
  `orig_id` varchar(255) default NULL,
  PRIMARY KEY  (`id`),
  KEY `index_reports_on_title` (`title`),
  KEY `index_content_on_reports` (`content`(128))

CREATE TABLE `timespans` (
  `id` int(11) NOT NULL auto_increment,
  `after_date` datetime default NULL,
  `before_date` datetime default NULL,
  `after_offset` int(11) default NULL,
  `before_offset` int(11) default NULL,
  `is_common` tinyint(1) default NULL,
  `created_at` datetime default NULL,
  `updated_at` datetime default NULL,
  `is_search_chunk` tinyint(1) default NULL,
  `is_day` tinyint(1) default NULL,
  PRIMARY KEY  (`id`),
  KEY `index_timespans_on_after_date` (`after_date`),
  KEY `index_timespans_on_before_date` (`before_date`)

And here is the explain:

+----+-------------+-----------+-------+--------------------------------------------------------------+-------------------------------+---------+------+--------+----------------------------------------------+
| id | select_type | table     | type  | possible_keys                                                | key                           | key_len | ref  | rows   | Extra                                        |
+----+-------------+-----------+-------+--------------------------------------------------------------+-------------------------------+---------+------+--------+----------------------------------------------+
|  1 | SIMPLE      | timespans | range | index_timespans_on_after_date,index_timespans_on_before_date | index_timespans_on_after_date | 9       | NULL |     84 | Using where; Using temporary; Using filesort | 
|  1 | SIMPLE      | reports   | ALL   | NULL                                                         | NULL                          | NULL    | NULL | 183297 | Using where                                  | 
+----+-------------+-----------+-------+--------------------------------------------------------------+-------------------------------+---------+------+--------+----------------------------------------------+

And here is the explain after I create an index on authored_at. As you can see, the index is not actually getting used (I think...)

+----+-------------+-----------+-------+--------------------------------------------------------------+-------------------------------+---------+------+--------+------------------------------------------------+
| id | select_type | table     | type  | possible_keys                                                | key                           | key_len | ref  | rows   | Extra                                          |
+----+-------------+-----------+-------+--------------------------------------------------------------+-------------------------------+---------+------+--------+------------------------------------------------+
|  1 | SIMPLE      | timespans | range | index_timespans_on_after_date,index_timespans_on_before_date | index_timespans_on_after_date | 9       | NULL |     86 | Using where; Using temporary; Using filesort   | 
|  1 | SIMPLE      | reports   | ALL   | index_reports_on_authored_at                                 | NULL                          | NULL    | NULL | 183317 | Range checked for each record (index map: 0x8) | 
+----+-------------+-----------+-------+--------------------------------------------------------------+-------------------------------+---------+------+--------+------------------------------------------------+

There are about 142k rows in the reports table, and far fewer in the timespans table.

The query is taking about 3 seconds now.

The strange thing is that if I add an index on reports.authored_at, it actually makes the query far slower, about 20 seconds. I would have thought it would do the opposite, since it would make it easy to find the reports at either end of the range, and throw the rest away, rather than having to examine all of them.

Can someone clarify? I'm stumped.

3
please put you explain result and your table defination,tkxneocanable
There really should be an index on reports.authored_at. What does EXPLAIN say after that column is indexed?Wiseguy

3 Answers

1
votes

Instead of two separate indexes for the timespan table, try merging them into a single multi-column index with before_date and after_date in a single index. Then add that index to authored_at as well.

1
votes

i rewrite you query like this:

select t.id, count(*) as num from timespans t 
  join reports r where t.after_date >= '2011-04-13 22:08:38' 
  and r.authored_at >= '2011-04-13 22:08:38' 
  and r.authored_at < t.before_date 
group by t.id order by null;

and change indexes of tables

alter table reports add index authored_at_idx(authored_at);
0
votes

You can used partition feature of database on column after_date. It will help u a lot.