I am trying to pass some ids into an in-clause on a sorted index with the same order by condition but the query planner is explicitly sorting the data after performing index search. below are my queries.
Generate a temporary table.
SELECT a.n/20 as n, md5(a.n::TEXT) as b INTO temp_table From generate_series(1, 100000) as a(n);
create an index
CREATE INDEX idx_temp_table ON temp_table(n ASC, b ASC);
In below query, planner uses index ordering and doesn't explicitly sorts the data.(expected)
EXPLAIN ANALYSE SELECT * from temp_table WHERE n = 10 ORDER BY n, b limit 5;
Query Plan
QUERY PLAN Limit (cost=0.42..16.07 rows=5 width=36) (actual time=0.098..0.101 rows=5 loops=1)
-> Index Only Scan using idx_temp_table on temp_table (cost=0.42..1565.17 rows=500 width=36) (actual time=0.095..0.098 rows=5 loops=1)
Index Cond: (n = 10)
Heap Fetches: 5 Planning time: 0.551 ms Execution time: 0.128 ms
but when i use one or more ids from a cte and pass them in clause then planner only uses index to fetch the values but explicitly sorts them afterwards (not expected).
EXPLAIN ANALYSE WITH cte(x) AS (VALUES (10)) SELECT * from temp_table WHERE n IN ( SELECT x from cte) ORDER BY n, b limit 5;
then planner uses below query plan
QUERY PLAN
QUERY PLAN
Limit (cost=85.18..85.20 rows=5 width=37) (actual time=0.073..0.075 rows=5 loops=1)
CTE cte
-> Values Scan on "*VALUES*" (cost=0.00..0.03 rows=2 width=4) (actual time=0.001..0.002 rows=2 loops=1)
-> Sort (cost=85.16..85.26 rows=40 width=37) (actual time=0.072..0.073 rows=5 loops=1)
Sort Key: temp_table.n, temp_table.b
Sort Method: top-N heapsort Memory: 25kB
-> Nested Loop (cost=0.47..84.50 rows=40 width=37) (actual time=0.037..0.056 rows=40 loops=1)
-> Unique (cost=0.05..0.06 rows=2 width=4) (actual time=0.009..0.010 rows=2 loops=1)
-> Sort (cost=0.05..0.06 rows=2 width=4) (actual time=0.009..0.010 rows=2 loops=1)
Sort Key: cte.x
Sort Method: quicksort Memory: 25kB
-> CTE Scan on cte (cost=0.00..0.04 rows=2 width=4) (actual time=0.004..0.005 rows=2 loops=1)
-> Index Only Scan using idx_temp_table on temp_table (cost=0.42..42.02 rows=20 width=37) (actual time=0.012..0.018 rows=20 loops=2)
Index Cond: (n = cte.x)
Heap Fetches: 40
Planning time: 0.166 ms
Execution time: 0.101 ms
I tried putting an explicit sorting while passing the ids in where clause so that sorted order in ids is maintained but still planner sorted explicitly
EXPLAIN ANALYSE WITH cte(x) AS (VALUES (10)) SELECT * from temp_table WHERE n IN ( SELECT x from cte) ORDER BY n, b limit 5;
Query plan
QUERY PLAN
Limit (cost=42.62..42.63 rows=5 width=37) (actual time=0.042..0.044 rows=5 loops=1)
CTE cte
-> Result (cost=0.00..0.01 rows=1 width=4) (actual time=0.000..0.000 rows=1 loops=1)
-> Sort (cost=42.61..42.66 rows=20 width=37) (actual time=0.042..0.042 rows=5 loops=1)
Sort Key: temp_table.n, temp_table.b
Sort Method: top-N heapsort Memory: 25kB
-> Nested Loop (cost=0.46..42.28 rows=20 width=37) (actual time=0.025..0.033 rows=20 loops=1)
-> HashAggregate (cost=0.05..0.06 rows=1 width=4) (actual time=0.009..0.009 rows=1 loops=1)
Group Key: cte.x
-> Sort (cost=0.03..0.04 rows=1 width=4) (actual time=0.006..0.006 rows=1 loops=1)
Sort Key: cte.x
Sort Method: quicksort Memory: 25kB
-> CTE Scan on cte (cost=0.00..0.02 rows=1 width=4) (actual time=0.003..0.003 rows=1 loops=1)
-> Index Only Scan using idx_temp_table on temp_table (cost=0.42..42.02 rows=20 width=37) (actual time=0.014..0.020 rows=20 loops=1)
Index Cond: (n = cte.x)
Heap Fetches: 20
Planning time: 0.167 ms
Execution time: 0.074 ms
Can anyone explain why planner is using an explicit sort on the data? Is there a way to by pass this and make planner use the index sorting order so additional sorting on the records can be saved. In production, we have similar case but size of our selection is too big but only a handful of records needs to fetched with pagination. Thanks in anticipation!
in
clause without a CTE, what is the query plan? I suspect that is the issue, not the CTE. If so, you might be able to fix the problem usingexists
. – Gordon Linoff