2
votes

I have the following query

SELECT a.id, b.id from table1 AS a, table2 AS b
WHERE a.table2_id IS NULL
AND a.plane = SUBSTRING(b.imb, 1, 20)
AND (a.stat LIKE "f%" OR a.stat LIKE "F%")

Here is the output of EXPLAIN

+----+-------------+-------+------+-------------------------------------------------------------------------------------------+------------------------------+---------+------+----------+-------------+
| id | select_type | table | type | possible_keys                                                                             | key                          | key_len | ref  | rows     | Extra       |
+----+-------------+-------+------+-------------------------------------------------------------------------------------------+------------------------------+---------+------+----------+-------------+
|  1 | SIMPLE      | b     | ALL  | NULL                                                                                      | NULL                         | NULL    | NULL | 28578039 |             |
|  1 | SIMPLE      | a     | ref  | index_on_plane,index_on_table2_id_id,mysql_confirmstat_on_stat                            | index_on_plane               | 258     |  func|        2 | Using where |
+----+-------------+-------+------+-------------------------------------------------------------------------------------------+------------------------------+---------+------+----------+-------------+

The query takes around 80 minutes to execute.

The indexes on table1 are as follows

+--------------+------------+--------------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table        | Non_unique | Key_name                       | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+--------------+------------+--------------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| table1 |          0 | PRIMARY             |            1 | id         | A         |    50319117 |     NULL | NULL   |      | BTREE      |         |               |
| table1 |          1 | index_on_post       |            1 | post       | A         |     7188445 |     NULL | NULL   | YES  | BTREE      |         |               |
| table1 |          1 | index_on_plane      |            1 | plane      | A         |    25159558 |     NULL | NULL   | YES  | BTREE      |         |               |
| table1 |          1 | index_on_table2_id  |            1 | table2_id  | A         |    25159558 |     NULL | NULL   | YES  | BTREE      |         |               |
| table1 |          1 | index_on_stat       |            1 | stat       | A         |         187 |     NULL | NULL   | YES  | BTREE      |         |               |
+--------------+------------+--------------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+

and table2 indexes are.

+-------+------------+---------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name                  | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-------+------------+---------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| table2 |          0 | PRIMARY                 |            1 | id             | A         |    28578039 |     NULL | NULL   |      | BTREE      |         |               |
| table2 |          1 | index_on_post           |            1 | post           | A         |    28578039 |     NULL | NULL   | YES  | BTREE      |         |               |
| table2 |          1 | index_on_ver            |            1 | ver            | A         |        1371 |     NULL | NULL   | YES  | BTREE      |         |               |
| table2 |          1 | index_on_imb            |            1 | imb            | A         |    28578039 |     NULL | NULL   | YES  | BTREE      |         |               |
+-------+------------+---------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+

How can the execution time of this query be improved?

Here is the updated explain

EXPLAIN SELECT STRAIGHT_JOIN a.id, b.id from table1 AS a JOIN b AS b  
ON a.plane=substring(b.imb,1,20) 
WHERE a.table2_id IS NULL  
AND (a.stat LIKE "f%" OR a.stat LIKE "F%");
+----+-------------+-------+------+-------------------------------------------------------------------------------------------+-------------------------------+---------+-------+----------+--------------------------------+
| id | select_type | table | type | possible_keys                                       | key                | key_len | ref   | rows     | Extra                          |
+----+-------------+-------+------+-------------------------------------------------------------------------------------------+-------------------------------+---------+-------+----------+--------------------------------+
|  1 | SIMPLE      | a     | ref  | index_on_plane,index_on_table2_id,index_on_stat     | index_on_table2_id | 5       | const |   500543 | Using where                    |
|  1 | SIMPLE      | b     | ALL  | NULL                                                | NULL               | NULL    | NULL  | 28578039 | Using where; Using join buffer |
+----+-------------+-------+------+-------------------------------------------------------------------------------------------+-------------------------------+---------+-------+----------+--------------------------------+

SQL fiddle link http://www.sqlfiddle.com/#!2/362a6/4

4
try joining tables like ` ... from table1 a left join table2 b on a.plane=substring(b.imb,...)` - what will you receive?user15
in the explain result, what do you mean by S and CS. Is it table1 and table2 respectively?ursitesion
You could try to put an index on table2 for the first 20 characters of table2.imb. stackoverflow.com/questions/10595037/…Brian Hoover
@ursitesion updated the questionandroidharry
@BrianHoover there is already an index on imb in table2androidharry

4 Answers

1
votes

Your schema dooms your query to slowness, in at least three ways. You are going to need to modify your schema to get anything like decent performance out of this. I can see three ways to fix your schema.

First way (probably very easy to fix):

  a.stat LIKE "f%" OR a.stat LIKE "F%"

This OR operation likely doubles the runtime of your query. But if you set the collation of your stat column to something case-insensitive you can change this to

  a.stat LIKE "f%"

You already have an index on this column.

Second way (maybe not so hard to fix). This clause definitively defeats the use of an index; they're useless when NULL values are involved.

WHERE a.table2_id IS NULL

Can you change the definition of table2_id to NOT NULL and provide a default value (perhaps zero) to indicate missing data? If so, you'll be in good shape because you'll be able to change this to a search predicate that uses an index.

WHERE a.table2_id = 0

Third way (probably hard). The presence of the function in this clause defeats the use of an index in joining.

WHERE ... a.plane = SUBSTRING(b.imb, 1, 20)

You need to make an extra column (yeah, yeah, in Oracle it could be a function index, but who has that kind of money?) called b.plane or something with that substring stored in it.

If you do all this stuff and refactor your query just a bit, here's what it will look like:

SELECT a.id AS aid, 
       b.id AS bid
  FROM table1 AS a
  JOIN table2 AS b ON a.plane = b.plane /* the new column */
 WHERE a.stat LIKE 'f%'
   AND a.table2_id = 0

Finally, you can probably tweak this performance up a bit by creating the following compound indexes as covering indexes for the query. Look up covering indexes if you're not sure what that means.

 table1  (table2_id, stat, plane, id)
 table2  (plane, id)  /* plane is your new column */

There's a tradeoff in covering indexes: they slow down insertion and update operations, but speed up queries. Only you have enough information to make that tradeoff wisely.

0
votes

Column on which join operation is getting perform must be indexed and MySQL optimiser should use it for better performance. It will minimise the number of rows examined (join size)

Try this one

SELECT STRAIGHT_JOIN a.id, b.id from table1 AS a JOIN table2 AS b ON a.plane=substring(b.imb,1,20)
WHERE a.table2__id IS NULL and (a.stat LIKE "f%" OR a.stat LIKE "F%")

Check the execution plan first. If it is even not using the index_on_imb index, create one composite index combining table2.imb and table2.id in which table2.imb would be top in order.

0
votes

An derived table may improve performance in this case depends on this indexes index_on_table2_id,index_on_stat..

SELECT a.id, b.id from table1 AS a, table2 AS b
WHERE a.table2_id IS NULL
AND a.plane = SUBSTRING(b.imb, 1, 20)
AND (a.stat LIKE "f%" OR a.stat LIKE "F%")

May be rewritten to.. The derived table will force MySQL into checking 500543 rows like the last explain said

SELECT a.id, b.id
FROM (SELECT plane FROM table1 WHERE (a.table2_id IS NULL) AND (a.stat LIKE "f%" OR a.stat LIKE "F%")) a
INNER JOIN table2 b
ON a.plane = SUBSTRING(b.imb, 1, 20)
0
votes

Aside from my comment about ID colmns, it appears you are trying to back-fill a join on the "plane" instead of by the ID columns. If I am correct, you want all records from table2 where there is no match in table1

select
      a.id,
      b.id
   from
      table2 b
         left join table1 a
            on b.id = a.table2_id
           AND substr( b.imb, 1, 20 ) = a.plane
           AND (   a.stat LIKE "f%" 
                OR a.stat LIKE "F%")
   where
      a.table2_id is null

Also, to help the index join, I would have covering indexes so the engine does not have to go back to the raw data to get qualifying records.

table1 -- index ( plane, stat, table2_id, id )
table2 -- index ( imb, id )

But again, please clarify basis of table join do or do not have it based on a Key... Per the sample columns of table1 having a column table2_id, I am GUESSING this relates to table2.id.

The purpose of doing a left-join basically says... For each record in the left-side table (in my example table2), join to the right-side table (table1) on whatever criteria/conditions -- now using the KEY ID column as primary basis, then the plane and status setting.

So, even though I'm doing a join between the two tables on the table2_id, if it DOES find a match, it will be excluded... Only when it does NOT find a match will it be included.

Finally, since you are hiding the true basis of the tables, you are leaving it to guessing work of those helping. Even if it was "personal" type of data, you are not showing any data, just how do I get it. Having a better mental image of what you are looking to get is better than bogus table/column names with limited context.