0
votes

I have a table that has over 2.5 million rows and I would like to run the following SQL Statment to get the

select count(*) 
  from workflow 
 where action_name= 'Workflow' 
   and release_date >= '2019-12-01 13:24:22' 
   and release_date <= '2019-12-31 13:24:22' 
   AND project_name= 'Web' 
 group 
    by page_id
     , headline
     , release_full_name
     , release_date

The problem is that it takes over 2.7 seconds to return 0 rows as expected. Is there a way to speed it up more? I have 6 more SQL Statements that are similiar so that will take almost (2.7 seconds * 6) = 17 seconds at least.

Here is my table schema

CREATE TABLE workflow (
 id int(11) NOT NULL AUTO_INCREMENT,
 action_name varchar(100) NOT NULL,
 project_name varchar(30) NOT NULL,
 page_id int(11) NOT NULL,
 headline varchar(200) NOT NULL,
 create_full_name varchar(200) NOT NULL,
 create_date datetime NOT NULL,
 change_full_name varchar(200) NOT NULL,
 change_date datetime NOT NULL,
 release_full_name varchar(200) NOT NULL,
 release_date datetime NOT NULL,
 reject_full_name varchar(200) NOT NULL,
 reject_date datetime NOT NULL,
 PRIMARY KEY (id)
) ENGINE=InnoDB AUTO_INCREMENT=2948271 DEFAULT CHARSET=latin1

enter image description here What I'm looking for in this query is to get the count of the pages that were released last month. that have project_name = "web" and action_name = "Workflow"

2
These days, I think 2.5 million is generally considered modest. Questions about query performance ALWAYS require, as a minimum, CREATE TABLE statements for ALL relevant tables, and the results of the EXPLAIN for the given query.Strawberry
Post the current table schema and index info. Also, have you obtained any execution plans that would indicate where the bottleneck(s) are?underscore_d
Also, note that this returns results per (page_id, headline, release_full_name, start_date), but there's no way of knowing which count belongs to which result, which seems odd.Strawberry
I have edited the post to include the structure of the table i'm using. Thanks!Alan Smith

2 Answers

1
votes

This is bit bigger for comments

  1. Using Group by with Count function doesn't make any sense. Usually you need to count actual rows in DB not after aggregation. Not sure if this is your actual requirement reason being GROUP BY causes slowness of the query.

  2. Use composite Index on (Web, start_date) as column project seems highest selective.

For other information, Please share the explain plan.

1
votes

Assuming that you need counts for groups (you had listed), better to include the group fields in select (essentially) like

select page_id, headline, release_full_name, release_date, count(*) 
  from ...

Adding an index with (page_id, headline) would optimize well.