Optimizing and indexing query with random component

Question

I have the following query in the code I've inherited:

SELECT  a.row2, a.row3
    FROM  table1 a
    JOIN  table2 b ON a.row1 = b.row1
    WHERE  b.row2 IN (
        SELECT  id
            FROM  table3
            WHERE  id IN ($table3_ids)
                     )
    ORDER BY  RAND();

[a.row1 is the primary key for table1]

Several questions:

Is there a more efficient way to structure this query?
I already have an index in table1 on (row1, row2, row4); is it redundant to make a separate index for (row1, row2, row3), or should I just replace the former with an index on (row1, row2, row3, row4)?
From the opposite end, I already have an index in table2 on (row1, row2, row3); since it would seem I need an index in table2 for (row1, row2) to optimize this query, would it be redundant to include an index that simply excludes a single element from a different index in the same table?

This is where I'm unclear on how the query engine can know which index is appropriate; when it parses the query, does it first check for matching indices in the table?

Lastly (and probably most simply answered), I'm adding indices with this syntax:

ALTER TABLE table_name ADD KEY (row1, row2, row3);

After creating the index, I'm then manually renaming each index descriptively. Is it possible to include the name of the index in the command?

Many thanks!

Gordon Linoff Gordon Linoff · Accepted Answer · 2015-07-24T01:10:23

This is your query:

SELECT a.row2, a.row3
FROM table1 a JOIN
     table2 b
     ON a.row1 = b.row1
WHERE b.row2 IN (SELECT id FROM table3 WHERE id IN ($table3_ids)) 
ORDER BY RAND();

I think the best indexes are: table2(row2, row1) and table1(row1, row2, row3), and table3(id). You can add row4 to the table1 index, but it doesn't make a difference. Also, it is really odd that you named your columns "row" -- for me it results in cognitive dissonance.

Actually, unless you have a typo in your query, you can leave out table3 and just do:

WHERE b.row2 IN ($table3_ids)

Note that in ($table3_ids) requires a string substitution. This cannot be parameterized. That introduces a danger of SQL injection.

If your result set is more than a few hundred, maybe a few thousand rows, then the order by will be significant. If this is the case, you might want to try a different approach to getting the results you want.

Optimizing and indexing query with random component

2 Answers