1
votes
Table_A
id int(11)
sku varchar(24)
Total number of records 250000, sku indexed

Table_B
id int(11)
sku varchar(16)

Total number of records - 180000, sku indexed

The following query has been running for over 30 minutes

Select count(*) 
from Table_A 
where not exists(select 1 from Table_B where Table_B.sku = Table_A.sku)

while Select count(*) from Table_B where exists(select 1 from Table_A where Table_A.sku = Table_B.sku) is executed in less than a second.

Could somebody explain what could be happening?

Forgot to mention Table_A and Table_B are in different databases


mysql> describe db1.table_a;
+------------+-------------+------+-----+---------+-------+
| Field      | Type        | Null | Key | Default | Extra |
+------------+-------------+------+-----+---------+-------+
| prodcut_id | int(11)     | NO   | PRI | NULL    |       |
| sku        | varchar(24) | YES  | MUL | NULL    |       |
+------------+-------------+------+-----+---------+-------+
2 rows in set (0.00 sec)



mysql> describe db2.table_b;
+-------+-------------+------+-----+---------+-------+
| Field | Type        | Null | Key | Default | Extra |
+-------+-------------+------+-----+---------+-------+
| sno   | int(11)     | NO   | PRI | NULL    |       |
| sku   | varchar(24) | YES  | MUL | NULL    |       |
+-------+-------------+------+-----+---------+-------+
2 rows in set (0.00 sec)




mysql> explain select count(*) from db2.table_b where not exists(select 1 from db1.table_a where table_a.sku = table_b.sku);
+----+--------------------+---------+-------+---------------+------+---------+------+--------+--------------------------+
| id | select_type        | table   | type  | possible_keys | key  | key_len | ref  | rows   | Extra                    |
+----+--------------------+---------+-------+---------------+------+---------+------+--------+--------------------------+
|  1 | PRIMARY            | table_b | index | NULL          | sku  | 27      | NULL | 181286 | Using where; Using index |
|  2 | DEPENDENT SUBQUERY | table_a | ref   | sku           | sku  | 75      | func |      1 | Using where; Using index |
+----+--------------------+---------+-------+---------------+------+---------+------+--------+--------------------------+
2 rows in set (0.00 sec)


mysql> explain select count(*) from db1.table_a where not exists(select 1 from db2.table_b where table_b.sku = table_a.sku);
+----+--------------------+---------+-------+---------------+------+---------+------+--------+--------------------------+
| id | select_type        | table   | type  | possible_keys | key  | key_len | ref  | rows   | Extra                    |
+----+--------------------+---------+-------+---------------+------+---------+------+--------+--------------------------+
|  1 | PRIMARY            | table_a | index | NULL          | sku  | 75      | NULL | 277030 | Using where; Using index |
|  2 | DEPENDENT SUBQUERY | table_b | index | NULL          | sku  | 27      | NULL | 181286 | Using where; Using index |
+----+--------------------+---------+-------+---------------+------+---------+------+--------+--------------------------+
2 rows in set (0.00 sec)

2
Try EXPLAIN that queryUnlink
DEPENDENT SUBQUERY isn't very good, try JOIN solution as @Notulysses sugesstUnlink
maybee problem should be, that you have sku in first table declared as varchar(24) and in second as varchar(16)Unlink
Thanks guys for your replies. The problem was with the collation, they were different. Changed the collation and the query completed in about 2 secs.a5995
@a5995 Thanks for sharing your findings that seem a very likely explanation if MySQL has to convert encodings/collations in the JOIN conditions. This would moreover explain a similar behavior I experienced myself with an imported MySQL dump. I ended up reimporting the data differently and the problem was solved so I didn't care any longer but the underlying reasons were probably the same. Actually, I strongly think that you should post this as an answer to your own question and accept it as it will definitely help others who do cross-database queries in the future.VH-NZZ

2 Answers

2
votes

In MySQL LEFT JOIN should have a better performance than NOT EXISTS:

SELECT COUNT(a.*) 
FROM Table_A a LEFT JOIN Table_B b ON a.sku = b.sku
WHERE b.sku IS NULL
1
votes

As suggested posting this as the answer. Found that the problem was with the collation. Changed the collation of the tables to be identical and the query completed in about 2 secs.