20
votes

It seems there is a limit on the number of jobs that Quartz scheduler can run per second. In our scenario we are having about 20 jobs per second firing up for 24x7 and quartz worked well upto 10 jobs per second (with 100 quartz threads and 100 database connection pool size for a JDBC backed JobStore), however, when we increased it to 20 jobs per second, quartz became very very slow and its triggered jobs are very late compared to their actual scheduled time causing many many Misfires and eventually slowing down the overall performance of the system significantly. One interesting fact is that JobExecutionContext.getScheduledFireTime().getTime() for such delayed triggers comes to be 10-20 and even more minutes after their schedule time.

How many jobs the quartz scheduler can run per second without affecting the scheduled time of the jobs and what should be the optimum number of quartz threads for such load?

Or am I missing something here?

Details about what we want to achieve:

We have almost 10k items (categorized among 2 or more categories, in current case we have 2 categories) on which we need to some processing at given frequency e.g. 15,30,60... minutes and these items should be processed within that frequency with a given throttle per minute. e.g. lets say for 60 minutes frequency 5k items for each category should be processed with a throttle of 500 items per minute. So, ideally these items should be processed within first 10 (5000/500) minutes of each hour of the day with each minute having 500 items to be processed which are distributed evenly across the each second of the minute so we would have around 8-9 items per second for one category.

Now for to achieve this we have used Quartz as scheduler which triggers jobs for processing these items. However, we don't process each item with in the Job.execute method because it would take 5-50 seconds (averaging to 30 seconds) per item processing which involves webservice call. We rather push a message for each item processing on JMS queue and separate server machines process those jobs. I have noticed the time being taken by the Job.execute method not to be more than 30 milliseconds.

Server Details:

Solaris Sparc 64 Bit server with 8/16 cores/threads cpu for scheduler with 16GB RAM and we have two such machines in the scheduler cluster.

5
If you need to do so much parallel and constant processing, do you think that Quartz is the right tool? Why not implement a consumer architecture with N threads waiting on whatever produces data?cherouvim
It seems like constant processing but it is not. I have added more details in my question now. If you have any other suitable tool for achieving this, please suggest.vikas
Did You manage to solve the problem? I believe I'm stuck on something similar atm...thorinkor

5 Answers

12
votes

In a previous project, I was confronted with the same problem. In our case, Quartz performed good up a granularity of a second. Sub-second scheduling was a stretch and as you are observing, misfires happened often and the system became unreliable.

Solved this issue by creating 2 levels of scheduling: Quartz would schedule a job 'set' of n consecutive jobs. With a clustered Quartz, this means that a given server in the system would get this job 'set' to execute. The n tasks in the set are then taken in by a "micro-scheduler": basically a timing facility that used the native JDK API to further time the jobs up to the 10ms granularity.

To handle the individual jobs, we used a master-worker design, where the master was taking care of the scheduled delivery (throttling) of the jobs to a multi-threaded pool of workers.

If I had to do this again today, I'd rely on a ScheduledThreadPoolExecutor to manage the 'micro-scheduling'. For your case, it would look something like this:

ScheduledThreadPoolExecutor scheduledExecutor;
...
    scheduledExecutor = new ScheduledThreadPoolExecutor(THREAD_POOL_SIZE);
...

// Evenly spread the execution of a set of tasks over a period of time
public void schedule(Set<Task> taskSet, long timePeriod, TimeUnit timeUnit) {
    if (taskSet.isEmpty()) return; // or indicate some failure ...
    long period = TimeUnit.MILLISECOND.convert(timePeriod, timeUnit);
    long delay = period/taskSet.size();
    long accumulativeDelay = 0;
    for (Task task:taskSet) {
        scheduledExecutor.schedule(task, accumulativeDelay, TimeUnit.MILLISECOND);
        accumulativeDelay += delay;
    }
}

This gives you a general idea on how use the JDK facility to micro-schedule tasks. (Disclaimer: You need to make this robust for a prod environment, like check failing tasks, manage retries (if supported), etc...).

With some testing + tuning, we found an optimal balance between the Quartz jobs and the amount of jobs in one scheduled set.

We experienced a 100X throughput improvement in this way. Network bandwidth was our actual limit.

6
votes

First of all check How do I improve the performance of JDBC-JobStore? in Quartz documentation.

As you can probably guess there is in absolute value and definite metric. It all depends on your setup. However here are few hints:

  • 20 jobs per second means around 100 database queries per second, including updates and locking. That's quite a lot!

  • Consider distributing your Quartz setup to cluster. However if database is a bottleneck, it won't help you. Maybe TerracottaJobStore will come to the rescue?

  • Having K cores in the system everything less than K will underutilize your system. If your jobs are CPU intensive, K is fine. If they are calling external web services, blocking or sleeping, consider much bigger values. However more than 100-200 threads will significantly slow down your system due to context switching.

  • Have you tried profiling? What is your machine doing most of the time? Can you post thread dump? I suspect poor database performance rather than CPU, but it depends on your use case.

2
votes

You should limit your number of threads to somewhere between n and n*3 where n is the number of processors available. Spinning up more threads is going to cause a lot of context switching, since most of them will be blocked most of the time.

As far as jobs per second, it really depends on how long the jobs run and how often they're blocked for operations like network and disk io.

Also, something to consider is that perhaps quartz isn't the tool you need. If you're sending off 1-2 million jobs a day, you might want to look into a custom solution. What are you even doing with 2 million jobs a day?!

Another option, which is a really bad way to approach the problem, but sometimes works... what is the server it's running on? Is it an older server? It might be bumping up the ram or other specs on it will give you some extra 'umph'. Not the best solution, for sure, because that delays the problem, not addresses, but if you're in a crunch it might help.

0
votes

In situations with high amount of jobs per second make sure your sql server uses row lock and not table lock. In mysql this is done by using InnoDB storage engine, and not the default MyISAM storage engine which only supplies table lock.

0
votes

Fundamentally the approach of doing 1 item at a time is doomed and inefficient when you're dealing with such a large number of things to do within such a short time. You need to group things - the suggested approach of using a job set that then micro-schedules each individual job is a first step, but that still means doing a whole lot of almost nothing per job. Better would be to improve your webservice so you can tell it to process N items at a time, and then invoke it with sets of items to process. And even better is to avoid doing this sort of thing via webservices and process them all inside a database, as sets, which is what databases are good for. Any sort of job that processes one item at a time is fundamentally an unscalable design.