I have a postgresql 13database with the following data:
- 1 MaterializedView mat_view 79 rows, columns: id (bigint), start_ts (double precision) and end_ts (double precision)
- 1 table time_tbl: 320 mio rows, 9 columns, relevant columns: gen_ts (double) and time_id (bigint, primary key), index on gen_ts (btree) and index on time_id (btree)
- 1 table v_tbl: 320 mio rows, 49 columns, relevant columns: _fk_time_id (bigint, key to time_tbl.time_id)
The following query uses fixed timestamps (1500 and 2772) takes around 800 ms:
WITH d AS(
SELECT * FROM time_bbl
WHERE gen_ts BETWEEN 1500 AND 2772
)
SELECT * FROM d JOIN v_tbl ON d.time_id = _fk_time_id
The output of explain/analyse is
Nested Loop Inner Join
- Index Scan on time_tbl
- Index Scan on v_tbl
In the upper example I define fixed timestamps. When I change the example to the following:
WITH ts AS(
SELECT start_ts, end_ts
FROM mat_view
WHERE id = 60
), d AS(
SELECT * FROM time_bbl, ts
WHERE gen_ts BETWEEN ts.start_ts AND ts_end_ts
)
SELECT * FROM d JOIN v_tbl ON d.time_id = _fk_time_id
It takes more than 50 seconds to complete. The time frame is the same.
The output of explain/analyse is:
Merge Inner Join
- Index Scan using _fk_time_id_idx on v_tbl (major costs!)
- Materialize
- Sort
- Nested Loop Inner Join
- Seq Scan on mat_view (minor costs)
- Indes Scan on time_tbl
Has anyone an idea how I can accelerate my query?
EDIT 1
Sorry, Laurenz EXPLAIN (ANALYZE, BUFFERS)
returns
Merge Join (cost=10305331.73..26533267.34 rows=36225336 width=286) (actual time=66017.916..66028.416 rows=12747 loops=1)
Merge Cond: (v_tbl._fk_time_id = time_tbl.time_id)
Buffers: shared hit=40 read=4474243 written=1
-> Index Scan using v_tbl_fk_time_id_idx on v_tbl (cost=0.57..14793820.93 rows=326028024 width=222) (actual time=0.164..52890.886 rows=244510550 loops=1)
Buffers: shared hit=4 read=4474084 written=1
-> Materialize (cost=10303264.73..10481232.72 rows=35593597 width=64) (actual time=6.406..8.218 rows=12747 loops=1)
Buffers: shared hit=36 read=159
-> Sort (cost=10303264.73..10392248.73 rows=35593597 width=64) (actual time=6.402..6.914 rows=12747 loops=1)
Sort Key: time_tbl.time_id
Sort Method: quicksort Memory: 2177kB
Buffers: shared hit=36 read=159
-> Nested Loop (cost=0.57..1824200.47 rows=35593597 width=64) (actual time=0.395..4.310 rows=12747 loops=1)
Buffers: shared hit=36 read=159
-> Seq Scan on mat_view (cost=0.00..1.99 rows=1 width=16) (actual time=0.016..0.017 rows=1 loops=1)
Filter: (id = 60)
Rows Removed by Filter: 78
Buffers: shared hit=1
-> Index Scan using gen_ts_idx on time_tbl (cost=0.57..1468262.51 rows=35593597 width=48) (actual time=0.372..3.205 rows=12747 loops=1)
Index Cond: ((gen_ts >= mat_view.start_ts) AND (gen_ts <= mat_view.end_ts))
Buffers: shared hit=35 read=159
Planning:
Buffers: shared hit=12 read=8 dirtied=2
Planning Time: 0.980 ms
JIT:
Functions: 16
Options: Inlining true, Optimization true, Expressions true, Deforming true
Timing: Generation 2.298 ms, Inlining 5.521 ms, Optimization 101.687 ms, Emission 58.024 ms, Total 167.530 ms
Execution Time: 66031.189 ms
d
have? If it has hundreds of thousands of records, you can instead create a table calledtemp_d
and insert data into it. Put an index on time_id and try the query again. See if that made any improvements. What's the acceptable worst-case time for a query like that in your mind? 20s? - zedfoxusd
takes 70 ms, the first query with defined timestamps takes around 800 ms. But the combination of both takes so much ( 50 sec) longer. - piogen_ts (double)
Why is this a double if it is supposed to represent a timestamp? - wildplasserEXPLAIN (ANALYZE, BUFFERS)
output. - Laurenz Albe