I have the following table:
create table images
(
image_id bigint,
image text,
url text,
post_id bigint,
checksum text,
path varchar
);
create index images_postid_idx
on images (post_id);
create index image_2020_idx
on images (image);
(Yes, there is no primary key! Also there are few other fields that are mostly null and are not relevant for the query.)
This very simple query:
SELECT 1 FROM images where image = 'foo';
produces the following execution plan:
Gather (cost=1000.00..40080349.00 rows=93 width=4) (actual time=339826.750..339933.048 rows=0 loops=1)
Workers Planned: 10
Workers Launched: 10
Buffers: shared hit=527195 read=39504582 dirtied=23
-> Parallel Seq Scan on images (cost=0.00..40079339.70 rows=9 width=4) (actual time=339800.607..339800.607 rows=0 loops=11)
Filter: (image = 'foo'::text)
Rows Removed by Filter: 3459138
Buffers: shared hit=527195 read=39504582 dirtied=23
Planning Time: 3.684 ms
JIT:
Functions: 34
" Options: Inlining true, Optimization true, Expressions true, Deforming true"
" Timing: Generation 4.039 ms, Inlining 382.076 ms, Optimization 121.531 ms, Emission 71.917 ms, Total 579.563 ms"
Execution Time: 339978.002 ms
What could be the cause of that problem? I copied the table, put on the same indices and copied a few thousands rows => everything worked fine there. I also ran ANALYSE images to update stats.
I am not sure why the planner doesn't use the index on this table. There are 28,091,491 distinct values for image out of 380,000,000 rows total. Since my query actually doesn't select anything from the table, why would the planner ever choose something other than an Index-Only Scan?
I am using PG 12.5.
Update:
Output of select * from pg_stats where tablename = 'images' and attname = 'image';: https://pastebin.com/Xeg7DjQd
Update 2:
Output of \d+ images:
Table "public.images"
Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
----------------+-------------------+-----------+----------+---------+----------+--------------+-------------
image_id | bigint | | | | plain | |
image | text | | | | extended | |
url | text | | | | extended | |
post_id | bigint | | | | plain | |
checksum | text | | | | extended | |
path | character varying | | | | extended | |
field1 | numeric | | | | main | |
field2 | numeric | | | | main | |
field3 | integer | | | | plain | |
field4 | numeric | | | | main | |
field5 | double precision | | | | plain | |
field6 | double precision | | | | plain | |
Indexes:
"image_2020_idx" btree (image)
"images_postid_idx" btree (post_id)
Replica Identity: FULL
Access method: heap
reindexto rebuild it. - jjanestextorvarchar(n)(if both store the same strings) - a_horse_with_no_name\d imagesinpsql? - Laurenz Albe