0
votes

I have the following query in the code I've inherited:

SELECT  a.row2, a.row3
    FROM  table1 a
    JOIN  table2 b ON a.row1 = b.row1
    WHERE  b.row2 IN (
        SELECT  id
            FROM  table3
            WHERE  id IN ($table3_ids)
                     )
    ORDER BY  RAND(); 

[a.row1 is the primary key for table1]

Several questions:

  • Is there a more efficient way to structure this query?

  • I already have an index in table1 on (row1, row2, row4); is it redundant to make a separate index for (row1, row2, row3), or should I just replace the former with an index on (row1, row2, row3, row4)?

  • From the opposite end, I already have an index in table2 on (row1, row2, row3); since it would seem I need an index in table2 for (row1, row2) to optimize this query, would it be redundant to include an index that simply excludes a single element from a different index in the same table?

This is where I'm unclear on how the query engine can know which index is appropriate; when it parses the query, does it first check for matching indices in the table?

  • Lastly (and probably most simply answered), I'm adding indices with this syntax:

    ALTER TABLE table_name ADD KEY (row1, row2, row3);

After creating the index, I'm then manually renaming each index descriptively. Is it possible to include the name of the index in the command?

Many thanks!

2
Perhaps you mean column instead of row?Rick James

2 Answers

1
votes

This is your query:

SELECT a.row2, a.row3
FROM table1 a JOIN
     table2 b
     ON a.row1 = b.row1
WHERE b.row2 IN (SELECT id FROM table3 WHERE id IN ($table3_ids)) 
ORDER BY RAND();

I think the best indexes are: table2(row2, row1) and table1(row1, row2, row3), and table3(id). You can add row4 to the table1 index, but it doesn't make a difference. Also, it is really odd that you named your columns "row" -- for me it results in cognitive dissonance.

Actually, unless you have a typo in your query, you can leave out table3 and just do:

WHERE b.row2 IN ($table3_ids)

Note that in ($table3_ids) requires a string substitution. This cannot be parameterized. That introduces a danger of SQL injection.

If your result set is more than a few hundred, maybe a few thousand rows, then the order by will be significant. If this is the case, you might want to try a different approach to getting the results you want.

1
votes

Some additions to Gordon's answer:

The ALTER TABLE reference shows an optional index_name in the syntax.

IN ( SELECT ... ) is grossly inefficient; turn it into a JOIN:

SELECT  a.row2, a.row3
    FROM  table1 a
    JOIN  table2 b ON a.row1 = b.row1
    JOIN  table3 c ON b.row2 = c.id
    WHERE  c.id IN ($table3_ids) )
    ORDER BY  RAND(); 

or...

SELECT  a.row2, a.row3
    FROM  table1 a
    JOIN  table2 b ON a.row1 = b.row1
    WHERE b.row2  IN ($table3_ids) )
    ORDER BY  RAND(); 

(A possible reason needing c: You are filtering on missing ids in c?)

ORDER BY RAND() is costly. It essentially cannot be optimized unless you also have a LIMIT.