Cross database query takes abnormaly long to run despite using indices as shown by EXPLAIN

Question

Table_A
id int(11)
sku varchar(24)
Total number of records 250000, sku indexed

Table_B
id int(11)
sku varchar(16)

Total number of records - 180000, sku indexed

The following query has been running for over 30 minutes

Select count(*) 
from Table_A 
where not exists(select 1 from Table_B where Table_B.sku = Table_A.sku)

while Select count(*) from Table_B where exists(select 1 from Table_A where Table_A.sku = Table_B.sku) is executed in less than a second.

Could somebody explain what could be happening?

Forgot to mention Table_A and Table_B are in different databases

mysql> describe db1.table_a;
+------------+-------------+------+-----+---------+-------+
| Field      | Type        | Null | Key | Default | Extra |
+------------+-------------+------+-----+---------+-------+
| prodcut_id | int(11)     | NO   | PRI | NULL    |       |
| sku        | varchar(24) | YES  | MUL | NULL    |       |
+------------+-------------+------+-----+---------+-------+
2 rows in set (0.00 sec)



mysql> describe db2.table_b;
+-------+-------------+------+-----+---------+-------+
| Field | Type        | Null | Key | Default | Extra |
+-------+-------------+------+-----+---------+-------+
| sno   | int(11)     | NO   | PRI | NULL    |       |
| sku   | varchar(24) | YES  | MUL | NULL    |       |
+-------+-------------+------+-----+---------+-------+
2 rows in set (0.00 sec)




mysql> explain select count(*) from db2.table_b where not exists(select 1 from db1.table_a where table_a.sku = table_b.sku);
+----+--------------------+---------+-------+---------------+------+---------+------+--------+--------------------------+
| id | select_type        | table   | type  | possible_keys | key  | key_len | ref  | rows   | Extra                    |
+----+--------------------+---------+-------+---------------+------+---------+------+--------+--------------------------+
|  1 | PRIMARY            | table_b | index | NULL          | sku  | 27      | NULL | 181286 | Using where; Using index |
|  2 | DEPENDENT SUBQUERY | table_a | ref   | sku           | sku  | 75      | func |      1 | Using where; Using index |
+----+--------------------+---------+-------+---------------+------+---------+------+--------+--------------------------+
2 rows in set (0.00 sec)


mysql> explain select count(*) from db1.table_a where not exists(select 1 from db2.table_b where table_b.sku = table_a.sku);
+----+--------------------+---------+-------+---------------+------+---------+------+--------+--------------------------+
| id | select_type        | table   | type  | possible_keys | key  | key_len | ref  | rows   | Extra                    |
+----+--------------------+---------+-------+---------------+------+---------+------+--------+--------------------------+
|  1 | PRIMARY            | table_a | index | NULL          | sku  | 75      | NULL | 277030 | Using where; Using index |
|  2 | DEPENDENT SUBQUERY | table_b | index | NULL          | sku  | 27      | NULL | 181286 | Using where; Using index |
+----+--------------------+---------+-------+---------------+------+---------+------+--------+--------------------------+
2 rows in set (0.00 sec)

DEPENDENT SUBQUERY isn't very good, try JOIN solution as @Notulysses sugesst — Unlink
maybee problem should be, that you have sku in first table declared as varchar(24) and in second as varchar(16) — Unlink
Thanks guys for your replies. The problem was with the collation, they were different. Changed the collation and the query completed in about 2 secs. — a5995
@a5995 Thanks for sharing your findings that seem a very likely explanation if MySQL has to convert encodings/collations in the JOIN conditions. This would moreover explain a similar behavior I experienced myself with an imported MySQL dump. I ended up reimporting the data differently and the problem was solved so I didn't care any longer but the underlying reasons were probably the same. Actually, I strongly think that you should post this as an answer to your own question and accept it as it will definitely help others who do cross-database queries in the future. — VH-NZZ

potashin potashin · Accepted Answer · 2014-06-20T13:54:16

In MySQL LEFT JOIN should have a better performance than NOT EXISTS:

SELECT COUNT(a.*) 
FROM Table_A a LEFT JOIN Table_B b ON a.sku = b.sku
WHERE b.sku IS NULL

Cross database query takes abnormaly long to run despite using indices as shown by EXPLAIN

2 Answers