Is there any performance gain in indexing a boolean field?

117

votes

I'm just about to write a query that includes a WHERE isok=1. As the name implies, isok is a boolean field (actually a TINYINT(1) UNSIGNED that is set to 0 or 1 as needed).

Is there any performance gain in indexing this field? Would the engine (InnoDB in this case) perform better or worse looking up the index?

mysqlindexinginnodb

possible duplicate of Indexing boolean fields – Maxim Krizhanovsky

132

votes

Just to put a finer point on several other answers here, since in my experience, those looking at questions like this are in the same boat we were, we've all heard that indexing Boolean fields is pointless, and yet...

We have a table with about 4 million rows, only about 1000 or so at a time will have a Boolean switch flagged and that's what we search against. Adding an index on our Boolean field sped up queries by orders of magnitude, it went from about 9+ seconds to a fraction of a second.

95

votes

Not really. You should think about it like a book. If there were only 3 kinds of words in a book and you index all of them, you would have the same number of index pages as normal pages.

There would be a performance gain if there are relatively few records of one value. For example, if you have 1000 records and 10 of them are TRUE, then it would be useful if you searching with isok = 1

As Michael Durrant mentioned, it also makes writes slower.

EDIT: Possible duplication: Indexing boolean fields

Here it explains that even if you have an index, if you have too many records it doesn't use the index anyways. MySQL not using index when checking = 1 , but using it with = 0

35

votes

It depends on the actual queries and the selectivity of the index/query combination.

Case A: condition WHERE isok = 1 and nothing else there:

SELECT *
FROM tableX
WHERE isok = 1

If the index is selective enough (say you have 1M rows and only 1k have isok = 1), then the SQL engine will probably use the index and be faster than without it.
If the index is not selective enough (say you have 1M rows and more than 100k have isok = 1), then the SQL engine will probably not use the index and do a table scan.

Case B: condition WHERE isok = 1 and more stuff:

SELECT *
FROM tableX
WHERE isok = 1
  AND another_column = 17

Then, it depends on what other indexes you have. An index on another_column would probably be more selective than the index on isok which has only two possible values. An index on (another_column, isok) or (isok, another_column) would be even better.

12

votes

It depends on the distribution of the data.

Imagine I had a book with 1000 closely typed pages, and the only words in my book were 'yes' and 'no' repeated over and over and distributed randomly. If I was asked to circle all the instances of 'yes', would an index in the back of the book help? It depends.

If there was a half-and-half random distribution of yes's and no's, then looking up in the index wouldn't help. The index would make the book a lot bigger, and anyway I'd be quicker just to start from the front and work my way through each page looking for all the instances of 'yes' and circling them, rather than looking up each item in the index and then taking the reference from the index entry to the page that it refers to.

But if there were, say, just ten instances of 'yes' in my thousand-page book and everything else was just millions of no's, then an index would save me loads of time in finding those ten instances of 'yes' and circling them.

It's the same in databases. If it's a 50:50 distribution, then an index isn't going to help - the database engine is better off just ploughing through the data from start to finish (full table scan), and the index would just make the database bigger, and slower to write and update. But if it is something like a 4000:1 distribution (as per oucil in this thread), then an index seek can speed it up hugely, if it is the 1 in 4000 items that you are looking for.

5

votes

No, usually not.

You usually index fields for searching when they have high selectivity/cardinality. A boolean field's cardinality is very low in most tables. It would also make your writes fractionally slower.

3

votes

Actually this depends on queries you run. But, generally yes, as well as indexing a field of any other type.

0

votes

Yes an index will improve performance, check the output of EXPLAIN with and without the index.

From the docs:

Indexes are used to find rows with specific column values quickly. Without an index, MySQL must begin with the first row and then read through the entire table to find the relevant rows. The larger the table, the more this costs. If the table has an index for the columns in question, MySQL can quickly determine the position to seek to in the middle of the data file without having to look at all the data.

I think it's also safe to say an index will not DECREASE performance in this case, so you have only to gain from it.

Is there any performance gain in indexing a boolean field?

7 Answers