78
votes

This is probably a really stupid question, but is there going to be much benefit in indexing a boolean field in a database table?

Given a common situation, like "soft-delete" records which are flagged as inactive, and hence most queries include WHERE deleted = 0, would it help to have that field indexed on its own, or should it be combined with the other commonly-searched fields in a different index?

6
@AmirAliAkbari: Oh! No! A circular reference! Hopefully S.O. won't explode!Paul

6 Answers

59
votes

No.

You index fields that are searched upon and have high selectivity/cardinality. A boolean field's cardinality is obliterated in nearly any table. If anything it will make your writes slower (by an oh so tiny amount).

Maybe you would make it the first field in the clustered index if every query took into account soft deletes?

17
votes

What is about a deleted_at DATETIME column? There are two benefits.

  1. If you need an unique column like name, you can create and soft-delete a record with the same name multiple times (if you use an unique index on the columns deleted_at AND name)
  2. You can search for recently deleted records.

You query could look like this:

SELECT * FROM xyz WHERE deleted_at IS NULL
7
votes

I think it would help, especially in covering indices.

How much/little is of course dependent on your data and queries.

You can have theories of all sorts about indices but final answers are given by the database engine in a database with real data. And often you are surprised by the answer (or maybe my theories are too bad ;)

Examine the query plan of your queries and determine if the queries can be improved, or if the indices can be improved. It's quite simple to alter indices and see what difference it makes

2
votes

I think it would help if you were using a view (where deleted = 0) and you are regularly querying from this view.

2
votes

i think if your boolean field is such that you would be referring to them in many cases, it would make sense to have a separate table, example DeletedPages, or SpecialPages, which will have many boolean type fields, like is_deleted, is_hidden, is_really_deleted, requires_higher_user etc, and then you would take joins to get them.

Typically the size of this table would be smaller and you would get some advantage by taking joins, especially as far as code readability and maintainability is concerned. And for this type of query:

select all pages where is_deleted = 1

It would be faster to have it implemented like this:

select all pages where pages 
inner join DeletedPages on page.id=deleted_pages.page_id 

I think i read it somewhere about mysql databases that you need a field to at least have cardinality of 3 to make indexing work on that field, but please confirm this.

0
votes

If you are using database that supports bitmap indexes (such as Oracle), then such an index on a boolean column will much more useful than without.