1
votes

I have big table with 11 mln records. I would like to get one record.

SELECT * 
FROM "PRD".events_log 
WHERE plc_time < '2012-11-19 14:00' 
  AND ((event_type_id IN (1,51) 
       AND machine_id = 1 
       AND island_id = 88) 
   OR (event_type_id IN (2000,2001) 
       AND machine_id=88)) 
ORDER BY plc_time desc 
LIMIT 1

Cost of ordering this query is big, because I don't limit plc_time in both sides. I can't limit this, so can I speed up this?

I have indexes on important fields, so lack of its cannot be a problem.

This query is one of several other in function pl/pgsql.

I heard something about cursor, but I don't know how can I use it.

This is explain analyze this query:

"Limit  (cost=4719.97..4719.97 rows=1 width=850) (actual time=6074.900..6074.901 rows=1 loops=1)"
"  ->  Sort  (cost=4719.97..4720.49 rows=208 width=850) (actual time=6074.897..6074.897 rows=1 loops=1)"
"        Sort Key: plc_time"
"        Sort Method:  top-N heapsort  Memory: 17kB"
"        ->  Bitmap Heap Scan on events_log  (cost=50.07..4718.93 rows=208 width=850) (actual time=248.306..6068.046 rows=6911 loops=1)"
"              Recheck Cond: (((machine_id = 1) AND (event_type_id = ANY ('{1,51}'::integer[]))) OR ((machine_id = 88) AND (event_type_id = ANY ('{2000,2001}'::integer[]))))"
"              Filter: ((plc_time   BitmapOr  (cost=50.07..50.07 rows=1246 width=0) (actual time=244.710..244.710 rows=0 loops=1)"
"                    ->  Bitmap Index Scan on fki_events_type_fk  (cost=0.00..24.98 rows=623 width=0) (actual time=238.529..238.529 rows=832699 loops=1)"
"                          Index Cond: ((machine_id = 1) AND (event_type_id = ANY ('{1,51}'::integer[])))"
"                    ->  Bitmap Index Scan on fki_events_type_fk  (cost=0.00..24.98 rows=623 width=0) (actual time=6.177..6.177 rows=6869 loops=1)"
"                          Index Cond: ((machine_id = 88) AND (event_type_id = ANY ('{2000,2001}'::integer[])))"
"Total runtime: 6075.175 ms"

And analyze table:

INFORMACJA:  analizowanie "PRD.events_log"
INFORMACJA:  "events_log": przeskanowano 30000 z 158056 stron, zawierających 2369701 żywych wierszy i 71270 martwych wierszy; 30000 wierszy w przykładzie, 12488167 szacowanych wszystkich wierszy
Zapytanie zostało wykonane w 52203 ms i nie zwróciło żadnych wyników.

Fast translate:
Scanned 3000 from 158056 pages, contains:
2369701 alive rows and 71270 dead rows.
30000 rows in example, 12488167 estimated all rows
2
explain analyze is worth adding. See stackoverflow.com/tags/postgresql-performance/infoCraig Ringer
plan is suboptimal probably - "Bitmap Index Scan on fki_events_type_fk (cost=0.00..24.98 rows=623 width=0) (actual time=238.529..238.529 rows=832699 loops=1)" - estimation is significantly different than reality. PS - query is optimized for last row, cursor is for first row - so you can try to use a cursor.Pavel Stehule
Which of the conditions are constant and which change from query to query?Erwin Brandstetter
only event_type_id is constant.user1756277

2 Answers

1
votes

Just an idea looking at the query: what if you create a subquery for the selection of the event_type? I can imagine that ordering is costing the most time and when using a subquery might lessen the data which has to be handled:

select * from "prd".events_log where plc_time < '2012-11-19 14:00' and id in (
 select e.id from "prd".events_log e where (e.event_type_id IN (1,51) etc...
   AND machine_id=88))) ORDER BY plc_time desc LIMIT 1;

Another solution could be a minimize the data in memory using another subquery:

select * from "prd".events_log where id in (select e.id from etc..);

The whole idea being that you only ask for the rest of the row when it is needed.

Code with a cursor looks like this:

create or replace function use_lock returns int as $$
declare
 cur refcursor;
 rec RECORD;
begin
 open cur for select .... ;
 loop
  fetch cur into rec;
  exit when not found;
  ..business logic working on the record.
 end loop;
 close cur;
END;
$$ LANGUAGE PLPGSQL STABLE;

Hope this helps,

Loek

0
votes

Try adding an index on plc_time. It will speed up the query.

Without the index on plc_time it will always do a full scan on the table because of ORDER BY plc_time.

UPD: Try ANALYZE the table. Details here http://www.postgresql.org/docs/current/static/sql-analyze.html .