PostgreSQL: Create an index to quickly distinguish NULL from non-NULL values

Question

Consider a SQL query with the following WHERE predicate:

...
WHERE name IS NOT NULL
...

Where name is a textual field in PostgreSQL.

No other query checks any textual property of this value, just whether it is NULL or not. Therefore, a full btree index seems like an overkill, even though it supports this distinction:

Also, an IS NULL or IS NOT NULL condition on an index column can be used with a B-tree index.

What's the right PostgreSQL index to quickly distinguish NULLs from non-NULLs?

You can add you predicate to the create index to at least minimize its size. — Uwe Allner
As you don't have another choice than btree, gist, gin and hash, which is discouraged, I don't see another way possible. — Uwe Allner
you can create index i on t (coalesce('NULL',col)); to actually index NULL and avoid separating one nulls from other nulls — Vao Tsun
@VaoTsun o_O? NULL is indexed. Where do you get the idea that it isn't? — Craig Ringer
@CraigRinger yes it is indexed, but all nulls are different, and I believe Adam wants to to see nulls as same. Adam?.. — Vao Tsun

jpmc26 jpmc26 · Accepted Answer · 2015-08-12T13:26:49

I'm interpreting you claim that it's "overkill" in two ways: in terms of complexity (using a B-Tree instead of just a list) and space/performance.

For complexity, it's not overkill. A B-Tree index is preferable because deletes from it will be faster than some kind of "unordered" index (for lack of a better term). (An unordered index would require a full index scan just to delete.) In light of that fact, any gains from an unordered index would be usually be outweighed by the detriments, so the development effort isn't justified.

For space and performance, though, if you want a highly selective index for efficiency, you can include a WHERE clause on an index, as noted in the fine manual:

CREATE INDEX ON my_table (name) WHERE name IS NOT NULL;

Note that you'll only see benefits from this index if it can allow PostgreSQL to ignore a large amount of rows when executing your query. E.g., if 99% of the rows have name IS NOT NULL, the index isn't buying you anything over just letting a full table scan happen; in fact, it would be less efficient (as @CraigRinger notes) since it would require extra disk reads. If however, only 1% of rows have name IS NOT NULL, then this represents huge savings as PostgreSQL can ignore most of the table for your query. If your table is very large, even eliminating 50% of the rows might be worth it. This is a tuning problem, and whether the index is valuable is going to depend heavily on the size and distribution of the data.

Additionally, there is very little gain in terms of space if you still need another index for the name IS NULL rows. See Craig Ringer's answer for details.

PostgreSQL: Create an index to quickly distinguish NULL from non-NULL values

3 Answers