How to create b-tree index on array elements in jsonb postgres

Question

I'm trying to store some reports in postgres as a jsonb field.

DDL:

CREATE TABLE reports (
    id INT, 
    report JSONB, 
    PRIMARY KEY (id)
)

Conceptually, report has structure like this:

{
    "metainfo": "infodata",
    "expense":{
         "rows": [
             {
                 "item": "Repair",
                 "cost": 15300.00,
                 "ts": "2021-04-24",
             },
             {
                 "item": "tractor",
                 "cost": 120000.00,
                 "ts": "2021-04-03",
             },
             ...
         }
    ]
}

The field set differ between reports, so not all of them have, for example, "item" field.

Let's assume in our example we have expense report for April 2021. So now I want to select all items from all reports, where cost > 100000.00

When generate 1 million reports, I found it takes me ~30sec to extract this data. Is it possible to create b-tree index, so it covers my case and accelerate my query:

select id, arr.item->'cost'
from reports, jsonb_array_elements(report->'expense'->'rows') arr(row) 
where (arr.row->'cost')::numeric > 100000::numeric

With explain analyze (my table name is "jsonb1", not "reports")

Nested Loop  (cost=0.01..4795009.70 rows=66000132 width=36) (actual time=132.281..170719.239 rows=1959507 loops=1)
->  Seq Scan on jsonb1  (cost=0.00..470001.04 rows=2000004 width=1790) (actual time=0.098..44013.831 rows=2000004 loops=1)
->  Function Scan on jsonb_array_elements arr  (cost=0.01..1.76 rows=33 width=32) (actual time=0.021..0.030 rows=1 loops=2000004)
        Filter: (((item -> 'cost'::text))::numeric > '100000'::numeric)
        Rows Removed by Filter: 0
Planning Time: 0.077 ms
JIT:
  Functions: 6
  Options: Inlining true, Optimization true, Expressions true, Deforming true
  Timing: Generation 1.027 ms, Inlining 49.986 ms, Optimization 57.205 ms, Emission 23.538 ms, Total 131.756 ms
Execution Time: 186026.874 ms

I've tried other types of queries, e.g. jsonpath, but all of them lead to seq scan. index creation:

CREATE INDEX costbtree ON reports USING BTREE (((data->'expense'->'rows'->'cost')::numeric));

Postgres verion: PostgreSQL 12.2

It may be because of statistics of these columns. Query planner may decide that it is faster and cheaper just to scan table instead of use proposed index. — Adam Tokarski
Please run this query with explain analyze and add this to your question. — Adam Tokarski
@AdamTokarski, updated question. Also add one level in json structure, to be closer to real example. — Kirill Bobrov
Your current query normalizes the data on the fly, that's why it can't really be indexed. If you normalize it right away indexing would be quite simple — a_horse_with_no_name

Laurenz Albe Laurenz Albe · Accepted Answer · 2021-04-27T10:27:16

You cannot index such a query in PostgreSQL. You could use a GIN index if you wanted to search the rows that have a cost of exactly 100000, but searching with > is not possible.

The underlying problem is your data model: If you don't store the data as JSON, but use a normalized data model, where each of the rows is an actual table row, it would be quite simple to write such a query and index it for efficiency.

How to create b-tree index on array elements in jsonb postgres

2 Answers