You could use an expression index, but you shouldn't. Keep it simple, and use a plain b-tree.
An expression index can be created on colname IS NOT NULL
:
test=> CREATE TABLE blah(name text);
CREATE TABLE
test=> CREATE INDEX name_notnull ON blah((name IS NOT NULL));
CREATE INDEX
test=> INSERT INTO blah(name) VALUES ('a'),('b'),(NULL);
INSERT 0 3
test=> SET enable_seqscan = off;
SET
craig=> SELECT * FROM blah WHERE name IS NOT NULL;
name
------
a
b
(2 rows)
test=> EXPLAIN SELECT * FROM blah WHERE name IS NOT NULL;
QUERY PLAN
-----------------------------------------------------------------------------
Bitmap Heap Scan on blah (cost=9.39..25.94 rows=1303 width=32)
Filter: (name IS NOT NULL)
-> Bitmap Index Scan on name_notnull (cost=0.00..9.06 rows=655 width=0)
Index Cond: ((name IS NOT NULL) = true)
(4 rows)
test=> SET enable_bitmapscan = off;
SET
test=> EXPLAIN SELECT * FROM blah WHERE name IS NOT NULL;
QUERY PLAN
------------------------------------------------------------------------------
Index Scan using name_notnull on blah (cost=0.15..55.62 rows=1303 width=32)
Index Cond: ((name IS NOT NULL) = true)
Filter: (name IS NOT NULL)
(3 rows)
... but Pg doesn't realise that it's also usable for IS NULL
:
test=> EXPLAIN SELECT * FROM blah WHERE name IS NULL;
QUERY PLAN
-------------------------------------------------------------------------
Seq Scan on blah (cost=10000000000.00..10000000023.10 rows=7 width=32)
Filter: (name IS NULL)
(2 rows)
and even transforms NOT (name IS NOT NULL)
into name IS NULL
, which is usually what you want.
test=> EXPLAIN SELECT * FROM blah WHERE NOT (name IS NOT NULL);
QUERY PLAN
-------------------------------------------------------------------------
Seq Scan on blah (cost=10000000000.00..10000000023.10 rows=7 width=32)
Filter: (name IS NULL)
(2 rows)
so you're actually better off with two disjoint expression indexes, one on the null and one on the non-null set.
test=> DROP INDEX name_notnull ;
DROP INDEX
test=> CREATE INDEX name_notnull ON blah((name IS NOT NULL)) WHERE (name IS NOT NULL);
CREATE INDEX
test=> EXPLAIN SELECT * FROM blah WHERE name IS NOT NULL;
QUERY PLAN
--------------------------------------------------------------------------
Index Scan using name_notnull on blah (cost=0.13..8.14 rows=3 width=32)
Index Cond: ((name IS NOT NULL) = true)
(2 rows)
test=> CREATE INDEX name_null ON blah((name IS NULL)) WHERE (name IS NULL);
CREATE INDEX
craig=> EXPLAIN SELECT * FROM blah WHERE name IS NULL;
QUERY PLAN
-----------------------------------------------------------------------
Index Scan using name_null on blah (cost=0.12..8.14 rows=1 width=32)
Index Cond: ((name IS NULL) = true)
(2 rows)
This is pretty gruesome though. For most sensible uses I'd just use a plain b-tree index. The index size improvement isn't too exciting, at least for small-ish inputs, like the dummy I created with a bunch of md5 values:
test=> SELECT pg_size_pretty(pg_relation_size('blah'));
pg_size_pretty
----------------
9416 kB
(1 row)
test=> SELECT pg_size_pretty(pg_relation_size('blah_name'));
pg_size_pretty
----------------
7984 kB
(1 row)
test=> SELECT pg_size_pretty(pg_relation_size('name_notnull'));
pg_size_pretty
----------------
2208 kB
(1 row)
test=> SELECT pg_size_pretty(pg_relation_size('name_null'));
pg_size_pretty
----------------
2208 kB
(1 row)
create index
to at least minimize its size. – Uwe Allnercreate index i on t (coalesce('NULL',col));
to actually indexNULL
and avoid separating one nulls from other nulls – Vao TsunNULL
is indexed. Where do you get the idea that it isn't? – Craig Ringer