I believe postgresql can be faster dealing with my query but every attempt to modify it makes it slower!
I have 2 tables:
- statistics (id, field1, [...], field10)
- geometry (id, geom)
I created indexes on :
- statistics.id
- geometry.id
- geometry (st_x(st_centroid(st_transform(geom, 2154))), st_y(st_centroid(st_transform(geom, 2154))))
Here is the query
EXPLAIN ANALYZE SELECT
statistics.*,
st_x(st_centroid(st_transform(geometry.geom, 2154))) AS x,
st_y(st_centroid(st_transform(geometry.geom, 2154))) AS y
FROM statistics
JOIN geometry ON statistics.id = geometry.id
WHERE statistics.id not like '97%';
Here is the result
Hash Join (cost=1294.66..5158.10 rows=36593 width=342) (actual time=20.788..1085.257 rows=36552 loops=1)
Hash Cond: (geometry.id = (statistics.id)::text)
-> Seq Scan on geometry (cost=0.00..2445.46 rows=36593 width=279) (actual time=0.010..25.271 rows=36597 loops=1)
Filter: (id !~~ '97%'::text)
-> Hash (cost=835.96..835.96 rows=36696 width=69) (actual time=19.892..19.892 rows=36696 loops=1)
Buckets: 4096 Batches: 1 Memory Usage: 3780kB
-> Seq Scan on statistics (cost=0.00..835.96 rows=36696 width=69) (actual time=0.005..6.871 rows=36696 loops=1)
Planning time: 0.401 ms
Execution time: 1088.612 ms
The most expensive operation is the Hash Join. How would you reorganise there query to get a better result?
Below is the schema of the tables
CREATE TABLE "statistics" (
"REG" integer,
"DEP" character varying(10),
"COM" character varying(50),
"D03" integer,
"D04" integer,
"D05" integer,
"D06" integer,
"D07" integer,
"D08" integer,
"D09" integer,
"D10" integer,
"D11" integer,
"D12" integer,
"D13" integer,
"id" text
);
CREATE TABLE geometry (
id text NOT NULL,
id_geo numeric(10,0),
cm_code character varying(3),
name character varying(50),
status character varying(20),
lat integer,
long integer,
lat_centroid integer,
long_centroid integer,
z_ smallint,
area numeric(10,0),
population double precision,
code_ct character varying(2),
code_r character varying(1),
code_dp character varying(2),
name_dp character varying(30),
code_rg character varying(2),
geom geometry(MultiPolygon,4326),
x real,
y real
);
There is about 40 000 rows in each table
The indexes have been created as follow
CREATE INDEX statistics_id_idx ON public.statistics USING btree (id COLLATE pg_catalog."default");
CREATE INDEX geometry_geom_idx ON public.geometry USING gist (geom);
CREATE INDEX geometry_id_gin2 ON public.geometry USING gin (id COLLATE pg_catalog."default" gin_trgm_ops);
for information I have tried different indexes (btree & gin) on geometry_id & statistics_id.
like
andgeometry.id
is different type ofstatistics.id
?? I dont understand your geometry index. for st_x. That wont help for that select. – Juan Carlos Oropeza