I have a query that is giving me problems and I can't understand why MySQL's query optimizer is behaving the way it is. Here is the background info:
I have 3 tables. Two are relatively small and one is large.
Table 1 (very small, 727 rows):
CREATE TABLE
ipa
(ipa_id
int(11) NOT NULL AUTO_INCREMENT,ipa_code
int(11) DEFAULT NULL,ipa_name
varchar(100) DEFAULT NULL,payorcode
varchar(2) DEFAULT NULL,compid
int(11) DEFAULT '2'
PRIMARY KEY (ipa_id
),
KEYipa_code
(ipa_code
) ) ENGINE=MyISAM
Table 2 (smallish, 59455 rows):
CREATE TABLE
assign_ipa
(assignid
int(11) NOT NULL AUTO_INCREMENT,ipa_id
int(11) NOT NULL,userid
int(11) NOT NULL,username
varchar(20) DEFAULT NULL,compid
int(11) DEFAULT NULL,PayorCode
char(10) DEFAULT NULL
PRIMARY KEY (assignid
),
UNIQUE KEYassignid
(assignid
,ipa_id
),
KEYipa_id
(ipa_id
)
) ENGINE=MyISAM
Table 3 (large, 24,711,730 rows):
CREATE TABLE
master_final
(IPA
int(11) DEFAULT NULL,MbrCt
smallint(6) DEFAULT '0',PayorCode
varchar(4) DEFAULT 'WC',
KEYidx_IPA
(IPA
)
) ENGINE=MyISAM DEFAULT
Now for the query. I'm doing a 3-way join using the first two smaller tables to essentially subset the big table on one of it's indexed values. Basically, I get a list of IDs for a user, SJOnes and query the big file for those IDs.
mysql> explain
SELECT master_final.PayorCode, sum(master_final.Mbrct) AS MbrCt
FROM master_final
INNER JOIN ipa ON ipa.ipa_code = master_final.IPA
INNER JOIN assign_ipa ON ipa.ipa_id = assign_ipa.ipa_id
WHERE assign_ipa.username = 'SJones'
GROUP BY master_final.PayorCode, master_final.ipa\G;
************* 1. row *************
id: 1
select_type: SIMPLE
table: master_final
type: ALL
possible_keys: idx_IPA
key: NULL
key_len: NULL
ref: NULL
rows: 24711730
Extra: Using temporary; Using filesort
************* 2. row *************
id: 1
select_type: SIMPLE
table: ipa
type: ref
possible_keys: PRIMARY,ipa_code
key: ipa_code
key_len: 5
ref: wc_test.master_final.IPA
rows: 1
Extra: Using where
************* 3. row *************
id: 1
select_type: SIMPLE
table: assign_ipa
type: ref
possible_keys: ipa_id
key: ipa_id
key_len: 4
ref: wc_test.ipa.ipa_id
rows: 37
Extra: Using where
3 rows in set (0.00 sec)
This query takes forever (like 30 minutes!). The explain statement tells me why, it's doing a full table scan on the big table even though there is a perfectly good index. It's not using it. I don't understand this. I can look at the query and see that it's only needs to query a couple of IDs from the big table. If I can do it, why can't MySQL's optimizer do it?
To illustrate, here are the IDs associated with 'SJones':
mysql> select username, ipa_id from assign_ipa where username='SJones';
+----------+--------+
| username | ipa_id |
+----------+--------+
| SJones | 688 |
| SJones | 689 |
+----------+--------+
2 rows in set (0.02 sec)
Now, I can rewrite the query substituting the ipa_id values for the username in the where clause. To me this is equivalent to the original query. MySQL sees it differently. If I do this, the optimizer makes use of the index on the big table.
mysql> explain
SELECT master_final.PayorCode, sum(master_final.Mbrct) AS MbrCt
FROM master_final
INNER JOIN ipa ON ipa.ipa_code = master_final.IPA
INNER JOIN assign_ipa ON ipa.ipa_id = assign_ipa.ipa_id
*WHERE assign_ipa.ipa_id in ('688','689')*
GROUP BY master_final.PayorCode, master_final.ipa\G;
************* 1. row *************
id: 1
select_type: SIMPLE
table: ipa
type: range
possible_keys: PRIMARY,ipa_code
key: PRIMARY
key_len: 4
ref: NULL
rows: 2
Extra: Using where; Using temporary; Using filesort
************* 2. row *************
id: 1
select_type: SIMPLE
table: assign_ipa
type: ref
possible_keys: ipa_id
key: ipa_id
key_len: 4
ref: wc_test.ipa.ipa_id
rows: 37
Extra: Using where
************* 3. row *************
id: 1
select_type: SIMPLE
table: master_final
type: ref
possible_keys: idx_IPA
key: idx_IPA
key_len: 5
ref: wc_test.ipa.ipa_code
rows: 34953
Extra: Using where
3 rows in set (0.00 sec)
The only thing I've changed is a where clause that doesn't even directly hit the big table. And yet, the optimizer uses the index 'idx_IPA' on the big table and the full table scan is no longer used. The query when re-written like this is very fast.
OK, that's a lot of background. Now my question. Why should the where clause matter to the optimizer? Either where clause will return the same result set from the smaller table, and yet I'm getting dramatically different results depending on which one I use. Obviously, I want to use the where clause containing the username rather than trying to pass all associated IDs to the query. As written though, this is not possible?
- Can someone explain why this is happening?
- How might I rewrite my query to avoid the full table scan?
Thanks for sticking with me. I know its a very longish question.