0
votes

I created an multi column index where I indexed 3 properties of the same jsonb column. It doesn't look like the index is being used even if the where clause uses the exact 3 properties. Is this not supported?

I'm using postgres 9.6 and tried both btree and gin

Here is the gin index

CREATE INDEX custom__observation__multicol  
    ON fhir.observation USING gin  
    (((resource -> 'subject'::text) -> 'reference'::text),  
(resource -> 'effectiveDateTime'::text),  
((resource -> 'code'::text) -> 'coding'::text))  
TABLESPACE pg_default;

The query plan looks like this

Bitmap Heap Scan on observation  (cost=600.20..2701.74 rows=1 width=766) (actual time=17749.389..25453.827 rows=1 loops=1)Filter: (is_latest AND (NOT is_deleted) AND ((resource -> 'effectiveDateTime'::text) = to_jsonb('2019-07-31T18:07:52Z'::text)) AND (((resource -> 'subject'::text) ->> 'reference'::text) = 'Patient/e69583dc510c4ed59e362e3ac8c803b5'::text))
      Recheck Cond: (((resource -> 'code'::text) -> 'coding'::text) @> '[{"code": "57485005", "system": "http://snomed.info/sct", "display": "Support"}]'::jsonb)
      Filter: (is_latest AND (NOT is_deleted) AND ((resource -> 'effectiveDateTime'::text) = to_jsonb('2019-07-31T18:07:52Z'::text)) AND (((resource -> 'subject'::text) ->> 'reference'::text) = 'Patient/e69583dc510c4ed59e362e3ac8c803b5'::text))
      Rows Removed by Filter: 93330  
      Heap Blocks: exact=52088
      Buffers: shared hit=373 read=52392 written=25
      ->  Bitmap Index Scan on custom__observation__nulticol  (cost=0.00..600.20 rows=560 width=0) (actual time=65.817..65.817 rows=93331 loops=1)
          Index Cond: (((resource -> 'code'::text) -> 'coding'::text) @> '[{"code": "57485005", "system": "http://snomed.info/sct", "display": "Support"}]'::jsonb)
          Buffers: shared hit=373 read=304
Planning time: 0.143 ms
Execution time: 25453.872 ms

I also ran the analyze command against the table but it doesn't look like all three indexes are applied. The query takes about 30 secs to return in pgAdmin

I enabled the btree_gin using "CREATE EXTENSION btree_gin" Query plan looks the same see below

    QUERY PLAN                     
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on observation  (cost=92.19..2190.12 rows=1 width=768) (actual time=23580.396..31767.965 rows=1 loops=1)
   Recheck Cond: (((resource -> 'code'::text) -> 'coding'::text) @> '[{"code": "57485005", "system": "http://snomed.info/sct", "display": "Support"}]'::jsonb)
   Filter: (is_latest AND (NOT is_deleted) AND ((resource -> 'effectiveDateTime'::text) = to_jsonb('2019-07-31T18:07:52Z'::text)) AND (((resource -> 'subject'::text) ->> 'reference'::text) = 'Patient/e69583dc510c4ed59e362e3ac8c803b5'::text))
   Rows Removed by Filter: 93330
   Heap Blocks: exact=52088
   Buffers: shared hit=226 read=52372
   ->  Bitmap Index Scan on custom__observation__multicol  (cost=0.00..92.19 rows=559 width=0) (actual time=1304.418..1304.418 rows=93331 loops=1)
         Index Cond: (((resource -> 'code'::text) -> 'coding'::text) @> '[{"code": "57485005", "system": "http://snomed.info/sct", "display": "Support"}]'::jsonb)
         Buffers: shared hit=226 read=284
 Planning time: 1.598 ms
 Execution time: 31768.323 ms

Based on the comment from below, I created two indexes

CREATE INDEX custom__observation__codeindex
    ON fhir.observation USING gin
    (((resource -> 'code'::text) -> 'coding'::text))
    TABLESPACE pg_default;

CREATE INDEX custom__observation__multicol
    ON fhir.observation USING btree
    (((resource -> 'subject'::text) ->> 'reference'::text) COLLATE pg_catalog."default", (resource ->> 'effectiveDateTime'::text) COLLATE pg_catalog."default")
    TABLESPACE pg_default;

Query is much faster and here is the query plan

    QUERY PLAN                                                       
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on observation  (cost=834.97..890.83 rows=1 width=767) (actual time=2599.083..3460.635 rows=1 loops=1)
   Recheck Cond: ((((resource -> 'code'::text) -> 'coding'::text) @> '[{"code": "57485005", "system": "http://snomed.info/sct", "display": "Support"}]'::jsonb) AND (((resource -> 'subject'::text) ->> 'reference'::text) = 'Patient/e69583dc510c4ed59e362e3ac8c803b5'::text))
   Filter: (is_latest AND (NOT is_deleted) AND ((resource -> 'effectiveDateTime'::text) = to_jsonb('2019-07-31T18:07:52Z'::text)))
   Rows Removed by Filter: 2331
   Heap Blocks: exact=2286
   Buffers: shared hit=217 read=2720
   ->  BitmapAnd  (cost=834.97..834.97 rows=14 width=0) (actual time=1241.916..1241.916 rows=0 loops=1)
         Buffers: shared hit=217 read=434
         ->  Bitmap Index Scan on custom__observation__codeindex  (cost=0.00..128.19 rows=559 width=0) (actual time=951.950..951.950 rows=93331 loops=1)
               Index Cond: (((resource -> 'code'::text) -> 'coding'::text) @> '[{"code": "57485005", "system": "http://snomed.info/sct", "display": "Support"}]'::jsonb)
               Buffers: shared hit=217 read=284
         ->  Bitmap Index Scan on custom__observation__multicol  (cost=0.00..706.52 rows=14130 width=0) (actual time=137.624..137.624 rows=13985 loops=1)
               Index Cond: (((resource -> 'subject'::text) ->> 'reference'::text) = 'Patient/e69583dc510c4ed59e362e3ac8c803b5'::text)
               Buffers: shared read=150
 Planning time: 5.311 ms
 Execution time: 3460.907 ms
(16 rows)

Seems like the two indexes work better but still, the multi col index is not applying the effectiveDateTime property so this is essentially still the same issue but now with two properties instead of three.

1
You might need to update the statistics in order for the query planner to recognise the index as being an optimal strategy. Try analyze table_nameAvin Kavish
Please edit your question and add the create index statement, the query you are using and execution plan generated using explain (analyze, buffers, format text) (not just a "simple" explain) as formatted text, no screen shots please. Or upload the plan to explain.depesz.coma_horse_with_no_name
@UniqIdentifierAssignedAtBirth. I tried that and edited my question.step
@a_horse_with_no_name: I added the gin index and the query planstep
The plan does show that the index is used (although that's a "simple" explain, not an explain (analyze, verbose) as I asked for). And showing us the query would also be helpfula_horse_with_no_name

1 Answers

0
votes

To support this query I suggest two different indexes:

One index to support the @> operator:

CREATE INDEX ON fhir.observation USING gin ((resource -> 'code') -> 'coding'));

And a BTree index to support the = operator on the scalar values:

CREATE INDEX 
   ON fhir.observation ( (resource -> 'subject' -> 'reference'), (resource -> 'effectiveDateTime') );

The reason the multi-column index is not used on the second and third expression is that a GIN index only supports the operators: ?, ?&, ?| and @>.

So the GIN index cannot be used for expressions using =.

That's why the second BTree index is needed for the equality conditions.