2
votes

We are running PostgresSql 9.6.11 database on Amazon RDS. The execution time of one of the queries is 6633.645 ms. This seems very slow. What changes can I make to improve the execution time for this query.

The query is selecting 3 columns where the data matches 6 of the columns.

select
    platform,
    publisher_platform,
    adset_id
FROM "adsets"
WHERE
    (("adsets"."account_id" IN ('1595321963838425', '1320001405', 'urn:li:sponsoredAccount:507697540')) AND
    ("adsets"."date" >= '2019-05-06 00:00:00.000000+0000') AND ("adsets"."date" <= '2019-05-13 23:59:59.999999+0000'))
GROUP BY
    "adsets"."platform",
    "adsets"."publisher_platform",
    "adsets"."adset_id"
ORDER BY
    "adsets"."platform",
    "adsets"."publisher_platform",
    "adsets"."adset_id";

The query is based on a table called adset table. The table has the following columns

account_id | text
campaign_id | text
adset_id | text
name | text
date | timestamp without time zone publisher_platform | text

and 15 other columns which are a mix of integers and text fields.

We have added the following indexes -

  1. "adsets_composite_unique_key" UNIQUE CONSTRAINT, btree (platform, account_id, campaign_id, adset_id, date, publisher_platform)
  2. "adsets_account_id_date_idx" btree (account_id DESC, date DESC) CLUSTER
  3. "adsets_account_id_index" btree (account_id)
  4. "adsets_adset_id_index" btree (adset_id)
  5. "adsets_campaign_id_index" btree (campaign_id)
  6. "adsets_name_index" btree (name)
  7. "adsets_platform_platform_id_publisher_platform" btree (account_id, platform, publisher_platform, adset_id)
  8. "idx_account_date_adsets" btree (account_id, date)
  9. "platform_pub_index" btree (platform, publisher_platform, adset_id).

The work_mem on postgres has been set to 125MB

Explain (analyse) shows

   Group  (cost=33447.55..33532.22 rows=8437 width=29) (actual time=6625.170..6633.062 rows=2807 loops=1)
   Group Key: platform, publisher_platform, adset_id
   ->  Sort  (cost=33447.55..33468.72 rows=8467 width=29) (actual time=6625.168..6629.271 rows=22331 loops=1)
         Sort Key: platform, publisher_platform, adset_id
         Sort Method: quicksort  Memory: 2513kB
         ->  Bitmap Heap Scan on adsets  (cost=433.63..32895.18 rows=8467 width=29) (actual time=40.003..6471.898 rows=22331 loops=1)
               Recheck Cond: ((account_id = ANY ('{1595321963838425,1320001405,urn:li:sponsoredAccount:507697540}'::text[])) AND (date >= '2019-05-06 00:00:00'::timestamp without time zone) AND (date <= '
2019-05-13 23:59:59.999999'::timestamp without time zone))
               Heap Blocks: exact=52907
               ->  Bitmap Index Scan on idx_account_date_adsets  (cost=0.00..431.51 rows=8467 width=0) (actual time=27.335..27.335 rows=75102 loops=1)
                     Index Cond: ((account_id = ANY ('{1595321963838425,1320001405,urn:li:sponsoredAccount:507697540}'::text[])) AND (date >= '2019-05-06 00:00:00'::timestamp without time zone) AND (date
<= '2019-05-13 23:59:59.999999'::timestamp without time zone))
 Planning time: 5.380 ms
 Execution time: 6633.645 ms
(12 rows)

Explain depesz

2

2 Answers

1
votes

First of all, you are using GROUP BY without actually selecting any aggregates. You might as well just do SELECT DISTINCT in your query. This aside, here is the B tree index which you probably should be using:

CREATE INDEX idx ON adsets (account_id, date, platform, publisher_platform,
    adset_id);

The problem with your current index is that, while it does cover the columns you are selecting, it does not involve the columns which appear in the WHERE clause. This means that Postgres might choose to not even use the index, and rather just scan the entire table.

Note that my suggestion still does nothing to deal with the select distinct portion of the query, but at least it might speed up everything which comes before that part of the query.

Here is your updated query:

SELECT DISTINCT
    platform,
    publisher_platform,
    adset_id
FROM adsets
WHERE
    account_id IN ('1595321963838425', '1320001405',
                   'urn:li:sponsoredAccount:507697540') AND
    date >= '2019-05-06' AND date < '2019-05-14';
0
votes

Your problem are the many “false positives” that are found during the bitmap index scan phase and removed during the heap scan phase. Since there is no additional filter, I guess that the extra rows must be removed because they are not visible.

See if a VACUUM adsets will improve the query performance.