I'm stuck with jsonb indexes and need help. I have a table with jsonb:
+-------+----------+------------------------------------------------------------+-------+
|id |measure_id|parameters |value |
+-------+----------+------------------------------------------------------------+-------+
|564174 |19 |{"1": 12, "2": 59, "5": 79, "6": 249, "7": 248, "8": 412} |42.461 |
|564176 |19 |{"1": 12, "2": 59, "5": 80, "6": 249, "7": 248, "8": 412} |46.198 |
|568244 |19 |{"1": 12, "2": 316, "5": 129, "6": 249, "7": 248, "8": 412} |19.482 |
|568246 |19 |{"1": 12, "2": 316, "5": 130, "6": 249, "7": 248, "8": 412} |20.051 |
|572313 |19 |{"1": 12, "2": 331, "5": 113, "6": 249, "7": 248, "8": 412} |7.098 |
|596434 |19 |{"1": 193, "2": 297, "5": 124, "6": 249, "7": 248, "8": 412}|103.253|
|682354 |22 |{"1": 427, "2": 25, "5": 121, "6": 426, "9": 441, "11": 428}|0.132 |
|686423 |22 |{"1": 427, "2": 60, "5": 72, "6": 426, "9": 443, "11": 428} |0.000 |
|1682439|44 |{"1": 193, "2": 518, "5": 91, "6": 426, "9": 429, "11": 431}|8.321 |
|1686787|44 |{"1": 193, "2": 515, "5": 96, "6": 426, "9": 429, "11": 431}|23.062 |
+-------+----------+------------------------------------------------------------+-------+
It's some statistical data and every row has measure and some parameters set. The number of parameters is different for every measure, so I put them in jsonb column. What I have to do:
select all distinct measures and parameters:
SELECT DISTINCT measure_id, jsonb_object_keys(parameters) AS parameter_id, parameters -> jsonb_object_keys(parameters) AS parameter_value_id FROM data;select data from this table:
SELECT d.id, d.measure_id, CAST(d.attributes as TEXT) as attributes, CAST(d.parameters as TEXT) as parameters, d.value FROM data d WHERE d.measure_id=19 AND (jsonb_extract_path(d.parameters, '1')::bigint in (12)) AND (jsonb_extract_path(d.parameters, '2')::bigint in (2,59)) AND (jsonb_extract_path(d.parameters, '5')::bigint in (79, 80, 129, 130, 113)) AND (jsonb_extract_path(d.parameters, '6')::bigint in (249)) AND (jsonb_extract_path(d.parameters, '7')::bigint in (248)) AND (jsonb_extract_path(d.parameters, '8')::bigint in (412)) ORDER BY d.id;
Both queries are running slow. My indexes:
CREATE INDEX idx_data_measure ON data USING btree (measure_id);
CREATE INDEX idx_data_parameters
ON data USING btree (((parameters ->> '1'::text)::bigint), ((parameters ->> '2'::text)::bigint),
((parameters ->> '5'::text)::bigint), ((parameters ->> '6'::text)::bigint),
((parameters ->> '7'::text)::bigint), ((parameters ->> '8'::text)::bigint),
((parameters ->> '9'::text)::bigint), ((parameters ->> '10'::text)::bigint),
((parameters ->> '11'::text)::bigint), ((parameters ->> '458'::text)::bigint),
((parameters ->> '717'::text)::bigint), ((parameters ->> '718'::text)::bigint),
((parameters ->> '719'::text)::bigint), ((parameters ->> '720'::text)::bigint));
I've tried to create one combined index:
CREATE INDEX idx_data_parameters ON data USING btree (measure_id, ((parameters ->> '1'::text)::bigint),...
but this doesn't help.
I've tried EXPLAIN ANALYZE, but honestly I don't understand it :(
EXPLAIN ANALYZE
SELECT DISTINCT
measure_id,
jsonb_object_keys(parameters) AS parameter_id,
parameters -> jsonb_object_keys(parameters) AS parameter_value_id
FROM data;
QUERY PLAN
Unique (cost=2212571.28..2222400.17 rows=982889 width=72) (actual time=79346.142..84316.123 rows=5050 loops=1)
-> Sort (cost=2212571.28..2215028.50 rows=982889 width=72) (actual time=79346.141..82358.141 rows=5586011 loops=1)
Sort Key: measure_id, (jsonb_object_keys(parameters)), ((parameters -> (jsonb_object_keys(parameters))))"
Sort Method: external merge Disk: 202816kB
-> Gather (cost=1000.00..2034108.05 rows=982889 width=72) (actual time=2467.949..63448.545 rows=5586011 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Result (cost=0.00..1934819.15 rows=40953700 width=72) (actual time=2432.167..63305.298 rows=1862004 loops=3)
-> ProjectSet (cost=0.00..1218129.40 rows=40953700 width=156) (actual time=2432.151..62251.992 rows=1862004 loops=3)
-> Parallel Seq Scan on data (cost=0.00..1010289.37 rows=409537 width=124) (actual time=2432.118..61448.821 rows=327630 loops=3)
Planning Time: 0.417 ms
Execution Time: 84406.575 ms
I feel that I have wrong indexes, but can't create it properly. As I understand GIN is not good idea as I need IN clause for parameters, so I made BTREE. Please help me with it.
EDIT 1: PG Version: PostgreSQL 11.8. Also updated query to fit sample data.
EDIT 2: Query plan for select data SELECT...WHERE...:
Sort (cost=1030.03..1030.04 rows=1 width=83) (actual time=63.659..63.661 rows=5 loops=1)
Sort Key: id
Sort Method: quicksort Memory: 26kB
Buffers: shared hit=4881
-> Index Scan using idx_data_measure on data d (cost=0.55..1030.02 rows=1 width=83) (actual time=0.044..63.635 rows=5 loops=1)
Index Cond: (measure_id = 19)
Filter: (((jsonb_extract_path(parameters, VARIADIC '{2}'::text[]))::bigint = ANY ('{2,59}'::bigint[])) AND ((jsonb_extract_path(parameters, VARIADIC '{1}'::text[]))::bigint = 12) AND ((jsonb_extract_path(parameters, VARIADIC '{6}'::text[]))::bigint = 249) AND ((jsonb_extract_path(parameters, VARIADIC '{7}'::text[]))::bigint = 248) AND ((jsonb_extract_path(parameters, VARIADIC '{8}'::text[]))::bigint = 412) AND ((jsonb_extract_path(parameters, VARIADIC '{5}'::text[]))::bigint = ANY ('{79,80,129,130,113}'::bigint[])))"
Rows Removed by Filter: 28733
Buffers: shared hit=4881
Planning Time: 0.451 ms
Execution Time: 64.973 ms
I see that idx_data_measure is working, and that's all...
select distinctquery has to open up and expand theparametersobject for every single row. The indexes you are creating on this table is far more work (and space used) than going with either a wide and sparse table or a related pair of tables. - Mike Organekwhere parameters @> '...' OR parameters @>'...'? It will be very long query, because user can select any parameter set. - zhoriqmeasure_idis already good enough to make use of the index on that column, so you probably don't need any additional indexes. - a_horse_with_no_nameexplain (analyze)output from the production server - a_horse_with_no_name