I have a table with several million rows, and the query I'm running is starting to bog down (more data goes into it everyday). I'm just a lowly web dev, so I kinda muddle along when it comes to database-related tasks. I suspect that a couple of indexes will correct this, or potentially some query rewrites, but was hoping someone would point me in the right direction. Here's the query:
SELECT
impressionreport.sitecode AS site, impressionreport.advertisername AS advertiser, impressionreport.mediafilename AS filename,
SUM(impressionreport.mediafileplays) AS totalplays, SUM(impressionreport.views) AS totalviewers, SUM(impressionreport.impressions) AS totalimpressions,
SUM(CASE WHEN impressionreport.gender LIKE 'male' THEN impressionreport.views ELSE 0 END) AS totalmale,
SUM(CASE WHEN impressionreport.gender LIKE 'female' THEN impressionreport.views ELSE 0 END) AS totalfemale,
SUM(CASE WHEN impressionreport.gender LIKE 'female' AND agegroup.name LIKE 'young' THEN impressionreport.views ELSE 0 END) AS femaleyoung,
SUM(CASE WHEN impressionreport.gender LIKE 'female' AND agegroup.name LIKE 'young adult' THEN impressionreport.views ELSE 0 END) AS femaleyoungadult,
SUM(CASE WHEN impressionreport.gender LIKE 'female' AND agegroup.name LIKE 'adult' THEN impressionreport.views ELSE 0 END) AS femaleadult,
SUM(CASE WHEN impressionreport.gender LIKE 'female' AND agegroup.name LIKE 'senior' THEN impressionreport.views ELSE 0 END) AS femalesenior,
SUM(CASE WHEN impressionreport.gender LIKE 'male' AND agegroup.name LIKE 'young' THEN impressionreport.views ELSE 0 END) AS maleyoung,
SUM(CASE WHEN impressionreport.gender LIKE 'male' AND agegroup.name LIKE 'young adult' THEN impressionreport.views ELSE 0 END) AS maleyoungadult,
SUM(CASE WHEN impressionreport.gender LIKE 'male' AND agegroup.name LIKE 'adult' THEN impressionreport.views ELSE 0 END) AS maleadult,
SUM(CASE WHEN impressionreport.gender LIKE 'male' AND agegroup.name LIKE 'senior' THEN impressionreport.views ELSE 0 END) AS malesenior
FROM impressionreport
LEFT JOIN agegroup ON impressionreport.age >= agegroup.min AND impressionreport.age <= agegroup.max
WHERE
impressionreport.datelocal >= '5-1-2018' AND
impressionreport.datelocal < '5-15-2018' AND
impressionreport.network LIKE '%' AND
impressionreport.sitecode LIKE '%' AND
impressionreport.devicename LIKE '%' AND
impressionreport.advertisername LIKE '%' AND
impressionreport.mediafilename LIKE '%'
GROUP BY impressionreport.sitecode, impressionreport.advertisername, impressionreport.mediafilename
ORDER BY impressionreport.sitecode, impressionreport.advertisername, impressionreport.mediafilename
There are indexes on datelocal, devicename, sitecode, advertisername, mediafilename, gender, age, network (all btree).
EDIT:
GroupAggregate (cost=197785.58..223336.77 rows=533798 width=161) (actual time=3789.770..5819.410 rows=4577 loops=1) Group Key: impressionreport.sitecode, impressionreport.advertisername, impressionreport.mediafilename -> Sort (cost=197785.58..198469.86 rows=1368560 width=103) (actual time=3789.651..4450.374 rows=1384106 loops=1) Sort Key: impressionreport.sitecode, impressionreport.advertisername, impressionreport.mediafilename Sort Method: external merge Disk: 119504kB -> Nested Loop Left Join (cost=0.09..116428.54 rows=1368560 width=103) (actual time=0.029..1485.883 rows=1384106 loops=1) Join Filter: ((impressionreport.age >= agegroup.min) AND (impressionreport.age <= agegroup.max)) Rows Removed by Join Filter: 4885607 -> Index Scan using impressionreport_datelocal_index on impressionreport (cost=0.09..91793.44 rows=1368560 width=75) (actual time=0.020..443.316 rows=1384106 loops=1) Index Cond: ((datelocal >= '2018-05-01 00:00:00'::timestamp without time zone) AND (datelocal < '2018-05-15 00:00:00'::timestamp without time zone)) -> Materialize (cost=0.00..1.02 rows=4 width=40) (actual time=0.000..0.000 rows=4 loops=1384106) -> Seq Scan on agegroup (cost=0.00..1.01 rows=4 width=40) (actual time=0.004..0.005 rows=4 loops=1) Planning time: 0.270 ms Execution time: 5838.433 ms
like
queries. You should try it. – Yavuz Selim