73
votes

How do I create an index on the date part of DATETIME field?

mysql> SHOW COLUMNS FROM transactionlist;
+-------------------+------------------+------+-----+---------+----------------+
| Field             | Type             | Null | Key | Default | Extra          |
+-------------------+------------------+------+-----+---------+----------------+
| TransactionNumber | int(10) unsigned | NO   | PRI | NULL    | auto_increment |
| WagerId           | int(11)          | YES  | MUL | 0       |                |
| TranNum           | int(11)          | YES  | MUL | 0       |                |
| TranDateTime      | datetime         | NO   |     | NULL    |                |
| Amount            | double           | YES  |     | 0       |                |
| Action            | smallint(6)      | YES  |     | 0       |                |
| Uid               | int(11)          | YES  |     | 1       |                |
| AuthId            | int(11)          | YES  |     | 1       |                |
+-------------------+------------------+------+-----+---------+----------------+
8 rows in set (0.00 sec)

TranDateTime is used to save the date and time of a transaction as it happens

My Table has over 1,000,000 records in it and the statement

SELECT * FROM transactionlist where date(TranDateTime) = '2008-08-17' 

takes a long time.

EDIT:

Have a look at this blog post on "Why MySQL’s DATETIME can and should be avoided"

13
warning comment for the link you suggested a look: The post is written with such an excitement and rage that it almost borders on the point of childishness. And the writer is not beating back any criticism , while still mentioning that he stands behind what he said , yet his point is falling slender with each. But still , not a waste of time, if you read the comments.kommradHomer

13 Answers

67
votes

If I remember correctly, that will run a whole table scan because you're passing the column through a function. MySQL will obediently run the function for each and every column, bypassing the index since the query optimizer can't really know the results of the function.

What I would do is something like:

SELECT * FROM transactionlist 
WHERE TranDateTime BETWEEN '2008-08-17' AND '2008-08-17 23:59:59.999999';

That should give you everything that happened on 2008-08-17.

15
votes

I don't mean to sound cute, but a simple way would be to add a new column that only contained the date part and index on that.

13
votes

Another option (relevant for version 5.7.3 and above) is to create a generated/virtual column based on the datetime column, then index it.

CREATE TABLE `table` (
`my_datetime` datetime NOT NULL,
`my_date` varchar(12) GENERATED ALWAYS AS (DATE(`my_datetime`)) STORED,
KEY `my_idx` (`my_date`)
) ENGINE=InnoDB;
9
votes

You can't create an index on just the date part. Is there a reason you have to?

Even if you could create an index on just the date part, the optimiser would probably still not use it for the above query.

I think you'll find that

SELECT * FROM transactionlist WHERE TranDateTime BETWEEN '2008-08-17' AND '2008-08-18'

Is efficient and does what you want.

4
votes

I don't know about the specifics of mySql, but what's the harm in just indexing the date field in its entirety?

Then just search:

 select * from translist 
     where TranDateTime > '2008-08-16 23:59:59'
        and TranDateTime < '2008-08-18 00:00:00'

If the indexes are b-trees or something else that's reasonable, these should get found quickly.

2
votes

Valeriy Kravchuk on a feature request for this very issue on the MySQL site said to use this method.

"In the meantime you can use character columns for storing DATETIME values as strings, with only first N characters being indexed. With some careful usage of triggers in MySQL 5 you can create a reasonably robust solution based on this idea."

You could write a routine pretty easy to add this column, and then with triggers keep this column synced up. The index on this string column should be pretty quick.

2
votes

The one and good solution that is pretty good working is to use timestamp as time, rather than datetime. It is stored as INT and being indexed good enough. Personally i encountered such problem on transactions table, that has about million records and slowed down hard, finally i pointed out that this caused by bad indexed field (datetime). Now it runs very quick.

1
votes

I don't know about the specifics of mySQL, but what's the harm in just indexing the date field in its entirety?

If you use functional magic for * trees, hashes, ... is gone, because for obtaining values you must call the function. But, because you do not know the results ahead, you have to do a full scan of the table.

There is nothing to add.

Maybe you mean something like computed (calculated?) indexes... but to date, I have only seen this in Intersystems Caché. I don't think there's a case in relational databases (AFAIK).

A good solution, in my opinion, is the following (updated clintp example):

SELECT * FROM translist 
WHERE TranDateTime >= '2008-08-17 00:00:00.0000'
  AND TranDateTime < '2008-08-18 00:00:00.0000'

Whether you use 00:00:00.0000 or 00:00 in my opinion makes no difference (I've generally used it in this format).

1
votes

datetime LIKE something% will not catch the index either.

Use this: WHERE datetime_field >= curdate();
That will catch the index,
and cover today:00:00:00 up to today:23:59:59
Done.

0
votes

What does 'explain' say? (run EXPLAIN SELECT * FROM transactionlist where date(TranDateTime) = '2008-08-17')

If it's not using your index because of the date() function, a range query should run fast:

SELECT * FROM transactionlist where TranDateTime >= '2008-08-17' AND TranDateTime < '2008-08-18'

0
votes

Rather than making an index based on a function (if that is even possible in mysql) make your where clause do a range comparison. Something like:

Where TranDateTime > '2008-08-17 00:00:00' and TranDateTime < '2008-08-17 11:59:59')

This lets the DB use the index on TranDateTime (there is one, right?) to do the select.

0
votes

If modifying the table is an option, or you're writing a new one, consider storing date and time in separate columns with respective types. You get performance by having a much smaller key space, and reduced storage (compared to a date-only column derived from a datetime). This also makes it feasible to use in compound keys, even before other columns.

In OP's case:

+-------------------+------------------+------+-----+---------+----------------+
| Field             | Type             | Null | Key | Default | Extra          |
+-------------------+------------------+------+-----+---------+----------------+
| TransactionNumber | int(10) unsigned | NO   | PRI | NULL    | auto_increment |
| WagerId           | int(11)          | YES  | MUL | 0       |                |
| TranNum           | int(11)          | YES  | MUL | 0       |                |
| TranDate          | date             | NO   |     | NULL    |                |
| TranTime          | time             | NO   |     | NULL    |                |
| Amount            | double           | YES  |     | 0       |                |
| Action            | smallint(6)      | YES  |     | 0       |                |
| Uid               | int(11)          | YES  |     | 1       |                |
| AuthId            | int(11)          | YES  |     | 1       |                |
+-------------------+------------------+------+-----+---------+----------------+
-1
votes

Create a new fields with just the dates convert(datetime, left(date_field,10)) and then index that.