I'm trying to store some reports in postgres as a jsonb field.
DDL:
CREATE TABLE reports (
id INT,
report JSONB,
PRIMARY KEY (id)
)
Conceptually, report has structure like this:
{
"metainfo": "infodata",
"expense":{
"rows": [
{
"item": "Repair",
"cost": 15300.00,
"ts": "2021-04-24",
},
{
"item": "tractor",
"cost": 120000.00,
"ts": "2021-04-03",
},
...
}
]
}
The field set differ between reports, so not all of them have, for example, "item" field.
Let's assume in our example we have expense report for April 2021. So now I want to select all items from all reports, where cost > 100000.00
When generate 1 million reports, I found it takes me ~30sec to extract this data. Is it possible to create b-tree index, so it covers my case and accelerate my query:
select id, arr.item->'cost'
from reports, jsonb_array_elements(report->'expense'->'rows') arr(row)
where (arr.row->'cost')::numeric > 100000::numeric
With explain analyze
(my table name is "jsonb1", not "reports")
Nested Loop (cost=0.01..4795009.70 rows=66000132 width=36) (actual time=132.281..170719.239 rows=1959507 loops=1)
-> Seq Scan on jsonb1 (cost=0.00..470001.04 rows=2000004 width=1790) (actual time=0.098..44013.831 rows=2000004 loops=1)
-> Function Scan on jsonb_array_elements arr (cost=0.01..1.76 rows=33 width=32) (actual time=0.021..0.030 rows=1 loops=2000004)
Filter: (((item -> 'cost'::text))::numeric > '100000'::numeric)
Rows Removed by Filter: 0
Planning Time: 0.077 ms
JIT:
Functions: 6
Options: Inlining true, Optimization true, Expressions true, Deforming true
Timing: Generation 1.027 ms, Inlining 49.986 ms, Optimization 57.205 ms, Emission 23.538 ms, Total 131.756 ms
Execution Time: 186026.874 ms
I've tried other types of queries, e.g. jsonpath, but all of them lead to seq scan. index creation:
CREATE INDEX costbtree ON reports USING BTREE (((data->'expense'->'rows'->'cost')::numeric));
Postgres verion: PostgreSQL 12.2
explain analyze
and add this to your question. – Adam Tokarski