0
votes

I have a table of bitcoin transactions:

    CREATE TABLE `transactions` (
      `trans_id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
      `trans_exchange` int(10) unsigned DEFAULT NULL,
      `trans_currency_base` int(10) unsigned DEFAULT NULL,
      `trans_currency_counter` int(10) unsigned DEFAULT NULL,
      `trans_tid` varchar(20) DEFAULT NULL,
      `trans_type` tinyint(4) DEFAULT NULL,
      `trans_price` decimal(15,4) DEFAULT NULL,
      `trans_amount` decimal(15,8) DEFAULT NULL,
      `trans_datetime` datetime DEFAULT NULL,
      `trans_sid` bigint(20) DEFAULT NULL,
      `trans_timestamp` int(10) unsigned DEFAULT NULL,
      PRIMARY KEY (`trans_id`),
      KEY `trans_tid` (`trans_tid`),
      KEY `trans_datetime` (`trans_datetime`),
      KEY `trans_timestmp` (`trans_timestamp`),
      KEY `trans_price` (`trans_price`),
      KEY `trans_amount` (`trans_amount`)
    ) ENGINE=MyISAM AUTO_INCREMENT=6162559 DEFAULT CHARSET=utf8;

As you can see from the AUTO_INCREMENT value, the table has over 6 million entries. There will eventually be many more.

I would like to query the table to obtain max price, min price, volume and total amount traded during arbitrary time intervals. To accomplish this, I'm using a query like this:

    SELECT 
    DATE_FORMAT( MIN(transactions.trans_datetime),
        '%Y/%m/%d %H:%i:00'
        ) AS trans_datetime,
    SUM(transactions.trans_amount) as trans_volume,
    MAX(transactions.trans_price) as trans_max_price,
    MIN(transactions.trans_price) as trans_min_price,
    COUNT(transactions.trans_id) AS trans_count 
    FROM 
    transactions
    WHERE
    transactions.trans_datetime BETWEEN '2014-09-14 00:00:00' AND '2015-09-13 23:59:00'
    GROUP BY 
    transactions.trans_timestamp DIV 86400

That should select transactions made over a year period, grouped by day (86,400 seconds).

The idea is the timestamp field, which contains the same value as datetime, but as a timestamp...I found this faster than UNIX_TIMESTAMP(trans_datetime), is divided by the amount of seconds I want to be in the time intervals.

The problem: the query is slow. I'm getting 4+ seconds processing time. Here is the result of EXPLAIN:

    id  select_type table   type    possible_keys   key key_len ref rows    Extra
    1   SIMPLE  transactions    ALL trans_datetime,trans_timestmp   NULL    NULL    NULL    6162558 Using where; Using temporary; Using filesort

The question: is it possible to optimize this better? Is this structure or approach flawed? I have tried several approaches, and have only succeeded in making modest millisecond-type gains.

1
Why transactions.trans_datetime both as argument to aggregate function and in GROUP BY clause?jarlh
As an aside, in general, you should GROUP BY the same thing that you SELECT. So, if you SELECT a DATE_FORMAT, then GROUP BY a DATE_FORMAT. It's true that you don't have to, but not doing so is likely to lead to errors.Strawberry
The DATE_FORMAT bit wasn't meant to be the thing grouped by; it was only there so there is some time representation coming out with each group so that it can be plotted on a graph. Removing it from the SQL query entirely seems to have no effect on speed.suzerain

1 Answers

0
votes

Most of the data in the table is for the last 12 months? So you need to touch most of the table? Then there is no way to speed that query up. However, you can get the same output orders of magnitude faster...

Create a summary table. It would have a DATE as the PRIMARY KEY, and the columns would be effectively the fields mentioned in your SELECT.

Once you have initially populated the summary table, then maintain it by adding a new row each night for the day's transactions. More in my blog.

Then the query to get the desired output would hit this Summary Table (with only a few hundred rows), not the table with millions or rows.