9
votes

I have a query like this:

SELECT fields FROM table
WHERE field1='something' OR field2='something' 
OR field3='something' OR field4='something'

What would be the correct way to index such a table for this query?

A query like this takes a entire second to run! I have 1 index with all 4 of those fields in it, so I'd think mysql would do something like this:

Go through each row in the index thinking this: Is field1 something? How about field2? field3? field4? Ok, nope, go to the next row.

2

2 Answers

20
votes

You misunderstand how indexes work.

Think of a telephone book (the equivalent of a two-column index on last name first, first name last). If I ask you to find all people in the telephone book whose last name is "Smith," you can benefit from the fact that the names are ordered that way; you can assume that the Smiths are organized together. But if I ask you to find all the people whose first name is "John" you get no benefit from the index. Johns can have any last name, and so they are scattered throughout the book and you end up having to search the hard way, from cover to cover.

Now if I ask you to find all people whose last name is "Smith" OR whose first name is "John", you can find the Smiths easily as before, but that doesn't help you at all to find the Johns. They're still scattered throughout the book and you have to search for them the hard way.

It's the same with multi-column indexes in SQL. The index is sorted by the first column, then sorted by the second column in cases of ties in the first column, then sorted by the third column in cases of ties in both the first two columns, etc. It is not sorted by all columns simultaneously. So your multi-column index doesn't help to make your search terms more efficient, except for the left-most column in the index.

Back to your original question.

What would be the correct way to index such a table for this query?

Create a separate, single-column index on each column. One of these indexes will be a better choice than the others, based on MySQL's estimation of how many I/O operations the index will incur if it is used.

Modern versions of MySQL also have some smarts about index merging, so the query may use more than one index in a given table, and then try to merge the results. Otherwise MySQL tends to be limited to use one index per table in a given query.

Another trick that a lot of people use successfully is to do a separate query for each of your indexed columns (which should use the respective index) and then UNION the results.

SELECT fields FROM table WHERE field1='something' 
UNION
SELECT fields FROM table WHERE field2='something' 
UNION
SELECT fields FROM table WHERE field3='something' 
UNION
SELECT fields FROM table WHERE field4='something' 

One final observation: if you find yourself searching for the same 'something' across four fields, you should reconsider if all four fields are actually the same thing, and you're guilty of designing a table that violates First Normal form with repeating groups. If so, perhaps field1 through field4 belong in a single column in a child table. Then it becomes a lot easier to index and query:

SELECT fields from table INNER JOIN child_table ON table.pk = child_table.fk
WHERE child_table.field = 'something'
0
votes

In addition to previous comment: Some RDMS like Mysql/PostgreSql can use index merge if optimizer thinks that it's good idea. So you can create different indexes for each field or create some composite indexes like field1,field2 and field3,field4. Finally, you should try several different solutions and choose with best explain plan.