I have a query which sometimes runs really fast and sometimes incredibly slowly depending on the number of results that match a full text boolean search within the query.
The query also contains a subquery.
Without the subquery the main query is always fast.
The subquery by itself is also always fast.
But together they are very slow.
Removing the full text search from a where clause and instead ordering by the full text search is really fast.
So it's only slow then when using a full text search search within a where.
That's the simple readable overview, exact queries are below.
I've included the schema at the bottom although it will be difficult to replicate without my dataset which unfortunately I can't share.
I've included the counts and increments in the example queries to give some indication of the data size involved.
I actually have a solution by simply accepting a result which includes irrelevant data and then filtering out that data in PHP. But i'd like to understand why my queries are performing poorly and how I might be able to resolve the issue in MySQL.
In particular i'm confused why it's fast with the full text search in an order by but not with it in the where.
The query I want (slow)
I've got a query that looks like this:
select
*,
MATCH (name) AGAINST ('Old Tra*' IN BOOLEAN MODE) AS relevance_score
from
`app_records`
where
`id` in (
select
distinct(app_record_parents.record_id)
from
`app_group_records`
inner join `app_record_parents`
on `app_record_parents`.`parent_id` = `app_group_records`.`record_id`
where
`group_id` = 3
)
and
MATCH (name) AGAINST ('Old Tra*' IN BOOLEAN MODE)
order by
`relevance_score` desc
limit
10;
This query takes 10 seconds.
This is too long for this sort of query, I need to be looking at milliseconds.
But the two queries run really fast when run by themselves.
The sub select by itself
select distinct(app_record_parents.record_id)
from
`app_group_records`
inner join
`app_record_parents`
on `app_record_parents`.`parent_id` = `app_group_records`.`record_id`
where
`group_id` = 3
The sub select by itself takes 7ms with 2600 results.
The main query without the sub select
select
*,
MATCH (name) AGAINST ('Old Tra*' IN BOOLEAN MODE) AS relevance_score
from
`app_records`
where
MATCH (name) AGAINST ('Old Tra*' IN BOOLEAN MODE)
order by
`relevance_score` desc
limit
10;
The main query without the sub select takes 6ms with 2971 possible results (obviously there's a limit 10 there).
It's faster with less results
The same query but matching against "Old Traf" rather than "Old Tra" takes 300ms.
The number of results are obviously different when using "Old Traf" vs "Old Tra".
Results of full query
- "Old Tra": 9
- "Old Traf": 2
Records matching the full text search
- "Old Tra": 2971
- "Old Traf": 120
Removing the where solves the issue
Removing the where and returning all records sorted by the relevance score is really fast and still gives me the experience i'd like:
select
*,
MATCH (name) AGAINST ('Old Tra*' IN BOOLEAN MODE) AS relevance_score
from
`app_records`
where
`id` in (
select
distinct(app_record_parents.record_id)
from
`app_group_records`
inner join `app_record_parents`
on `app_record_parents`.`parent_id` = `app_group_records`.`record_id`
where
`group_id` = 3
)
order by
`relevance_score` desc
limit
10;
But then I need to filter out irrelevant results in code
I'm using this in php so I can now filter my results to remove any that have a 0 relevance score (if there are only 2 matches for instance, 8 random results with a relevance score of 0 will still be included, since i'm not using a where).
array_filter($results, function($result) {
return $result->relevance_score > 0;
});
Obviously this is really quick so it's not really a problem.
But I still don't understand what's wrong with my queries.
So I do have a fix as outlined above. But I still don't understand why my queries are slow.
It's clear that the number of possible results from the full text search is causing an issue, but exactly why and how to get around this issue is beyond me.
Table Schema
Here are my tables
CREATE TABLE `app_records` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`type` varchar(255) COLLATE utf8mb4_unicode_ci NOT NULL,
`name` varchar(255) COLLATE utf8mb4_unicode_ci NOT NULL,
PRIMARY KEY (`id`),
FULLTEXT KEY `app_models_name_IDX` (`name`)
) ENGINE=InnoDB AUTO_INCREMENT=960004 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;
CREATE TABLE `app_record_parents` (
`record_id` int(10) unsigned NOT NULL,
`parent_id` int(10) unsigned DEFAULT NULL,
KEY `app_record_parents_record_id_IDX` (`record_id`) USING BTREE,
KEY `app_record_parents_parent_id_IDX` (`parent_id`) USING BTREE,
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;
CREATE TABLE `app_group_records` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`group_id` int(10) unsigned NOT NULL,
`record_id` int(10) unsigned NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=31 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;
A note on what the queries are doing
The subquery is getting a list of record_id's that belong to group_id 3.
So while there are 960004 records in app_records there are only 2600 which belong to group 3 and it is against these 2600 that i'm trying to query for name's that match "Old Tra",
So the subquery is getting a list of these 2600 record_id's and then i'm doing a WHERE id IN <subquery>
to get the relevant results from app_records.
EDIT: Using joins is equally slow
Just to add using joins has the same issue. Taking 10 seconds for "Old Tra" and 400ms for "Old Traf" and being very fast when not using a full text search in a where.
SELECT
app_records.*,
MATCH (NAME) AGAINST ('Old Tra*' IN BOOLEAN MODE) AS relevance_score
FROM
`app_records`
INNER JOIN app_record_parents ON app_records.id = app_record_parents.record_id
INNER JOIN app_group_records ON app_group_records.record_id = app_record_parents.parent_id
WHERE
`group_id` = 3
AND MATCH (NAME) AGAINST ('Old Tra*' IN BOOLEAN MODE)
GROUP BY
app_records.id
LIMIT
10;