MySQL query with a subquery takes significantly longer when using a full text in a where, rather than an order by

Question

I have a query which sometimes runs really fast and sometimes incredibly slowly depending on the number of results that match a full text boolean search within the query.

The query also contains a subquery.

Without the subquery the main query is always fast.

The subquery by itself is also always fast.

But together they are very slow.

Removing the full text search from a where clause and instead ordering by the full text search is really fast.

So it's only slow then when using a full text search search within a where.

That's the simple readable overview, exact queries are below.

I've included the schema at the bottom although it will be difficult to replicate without my dataset which unfortunately I can't share.

I've included the counts and increments in the example queries to give some indication of the data size involved.

I actually have a solution by simply accepting a result which includes irrelevant data and then filtering out that data in PHP. But i'd like to understand why my queries are performing poorly and how I might be able to resolve the issue in MySQL.

In particular i'm confused why it's fast with the full text search in an order by but not with it in the where.

The query I want (slow)

I've got a query that looks like this:

select 
  *, 
  MATCH (name) AGAINST ('Old Tra*' IN BOOLEAN MODE) AS relevance_score 
from 
  `app_records` 
where 
  `id` in (
    select 
      distinct(app_record_parents.record_id) 
    from 
      `app_group_records` 
      inner join `app_record_parents`
        on `app_record_parents`.`parent_id` = `app_group_records`.`record_id` 
    where 
      `group_id` = 3
  ) 
  and
    MATCH (name) AGAINST ('Old Tra*' IN BOOLEAN MODE)
order by 
  `relevance_score` desc 
limit 
  10;

This query takes 10 seconds.

This is too long for this sort of query, I need to be looking at milliseconds.

But the two queries run really fast when run by themselves.

The sub select by itself

select distinct(app_record_parents.record_id) 
from
   `app_group_records` 
   inner join
      `app_record_parents` 
      on `app_record_parents`.`parent_id` = `app_group_records`.`record_id` 
where
   `group_id` = 3

The sub select by itself takes 7ms with 2600 results.

The main query without the sub select

select 
  *, 
  MATCH (name) AGAINST ('Old Tra*' IN BOOLEAN MODE) AS relevance_score 
from 
  `app_records` 
where 
  MATCH (name) AGAINST ('Old Tra*' IN BOOLEAN MODE)
order by 
  `relevance_score` desc 
limit 
  10;

The main query without the sub select takes 6ms with 2971 possible results (obviously there's a limit 10 there).

It's faster with less results

The same query but matching against "Old Traf" rather than "Old Tra" takes 300ms.

The number of results are obviously different when using "Old Traf" vs "Old Tra".

Results of full query

"Old Tra": 9
"Old Traf": 2

Records matching the full text search

"Old Tra": 2971
"Old Traf": 120

Removing the where solves the issue

Removing the where and returning all records sorted by the relevance score is really fast and still gives me the experience i'd like:

select 
  *, 
  MATCH (name) AGAINST ('Old Tra*' IN BOOLEAN MODE) AS relevance_score 
from 
  `app_records` 
where 
  `id` in (
    select 
      distinct(app_record_parents.record_id) 
    from 
      `app_group_records` 
      inner join `app_record_parents`
        on `app_record_parents`.`parent_id` = `app_group_records`.`record_id` 
    where 
      `group_id` = 3
  )
order by 
  `relevance_score` desc 
limit 
  10;

But then I need to filter out irrelevant results in code

I'm using this in php so I can now filter my results to remove any that have a 0 relevance score (if there are only 2 matches for instance, 8 random results with a relevance score of 0 will still be included, since i'm not using a where).

array_filter($results, function($result) {
    return $result->relevance_score > 0;
});

Obviously this is really quick so it's not really a problem.

But I still don't understand what's wrong with my queries.

So I do have a fix as outlined above. But I still don't understand why my queries are slow.

It's clear that the number of possible results from the full text search is causing an issue, but exactly why and how to get around this issue is beyond me.

Table Schema

Here are my tables

CREATE TABLE `app_records` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `type` varchar(255) COLLATE utf8mb4_unicode_ci NOT NULL,
  `name` varchar(255) COLLATE utf8mb4_unicode_ci NOT NULL,
  PRIMARY KEY (`id`),
  FULLTEXT KEY `app_models_name_IDX` (`name`)
) ENGINE=InnoDB AUTO_INCREMENT=960004 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;



CREATE TABLE `app_record_parents` (
  `record_id` int(10) unsigned NOT NULL,
  `parent_id` int(10) unsigned DEFAULT NULL,
  KEY `app_record_parents_record_id_IDX` (`record_id`) USING BTREE,
  KEY `app_record_parents_parent_id_IDX` (`parent_id`) USING BTREE,
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;



CREATE TABLE `app_group_records` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `group_id` int(10) unsigned NOT NULL,
  `record_id` int(10) unsigned NOT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=31 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;

A note on what the queries are doing

The subquery is getting a list of record_id's that belong to group_id 3.

So while there are 960004 records in app_records there are only 2600 which belong to group 3 and it is against these 2600 that i'm trying to query for name's that match "Old Tra",

So the subquery is getting a list of these 2600 record_id's and then i'm doing a WHERE id IN <subquery> to get the relevant results from app_records.

EDIT: Using joins is equally slow

Just to add using joins has the same issue. Taking 10 seconds for "Old Tra" and 400ms for "Old Traf" and being very fast when not using a full text search in a where.

SELECT 
  app_records.*, 
  MATCH (NAME) AGAINST ('Old Tra*' IN BOOLEAN MODE) AS relevance_score 
FROM 
  `app_records` 
  INNER JOIN app_record_parents ON app_records.id = app_record_parents.record_id 
  INNER JOIN app_group_records ON app_group_records.record_id = app_record_parents.parent_id 
WHERE 
  `group_id` = 3 
  AND MATCH (NAME) AGAINST ('Old Tra*' IN BOOLEAN MODE) 
GROUP BY 
  app_records.id 
LIMIT 
  10;

Rick James Rick James · Accepted Answer · 2018-12-18T20:07:21

app_record_parents

Has no PRIMARY KEY; hence may have unnecessary duplicate pairs.
Does not have optimal indexes.
See this for several tips.
Perhaps app_group_records is also many-many?

Are you are searching for Old Tra* anywhere in name? If not, then why not use WHERE name LIKE 'Old Tra%. In this case, add INDEX(name).

Note: When FULLTEXT is involved, it is picked first. Please provide EXPLAIN SELECT to confirm this.

This formulation may be faster:

select  *,
        MATCH (r.name) AGAINST ('Old Tra*' IN BOOLEAN MODE) AS relevance_score
    from  `app_records` AS r
    WHERE MATCH (r.name) AGAINST ('Old Tra*' IN BOOLEAN MODE)
      AND EXISTS ( SELECT 1
              FROM app_group_records AS gr
              JOIN app_record_parents AS rp  ON rp.parent_id = gr.record_id
              WHERE gr.group_id = 3
                AND r.id = rp.record_id )
    ORDER BY relevance_score DESC
    LIMIT 10

Indexes:

gr:  (group_id, record_id)  -- in this order
r:   nothing but the FULLTEXT will be used
rp:  (record_id, parent_id)  -- in this order