How to gracefully handle thousands of Quartz misfires?

Question

We have an application that needs to

nightly reprocess large amounts of data, and
reprocess large amounts of data on demand.

In both of these cases, around 10,000 quartz jobs get spawned and then run. In the case of nightly, we have one quartz cron job that spawns the 10,000 jobs which each individually do the work of processing the data.

The issue that we have is that we are running with around 30 threads, so naturally the quartz jobs misfire, and continue to misfire until everything is processed. The processing can take up to 6 hours. Each of these 10,000 jobs pertain to a specific domain object that can processed in parallel and are completely independent. Each of the 10,000 jobs can take a variable amount of time (from half a second to a minute).

My question is:

Is there a better way to do this?
If not, what is the best way for us to schedule/setup our quartz jobs so that a minimal amount of time is spent thrashing and dealing with misfires?

A note about or architecture: We are running two clusters with three nodes apiece. The version of quartz is a bit old (2.0.1), and clustering is enabled in the quartz.properties file.

There isn't any way you could distribute workload evenly over day? (e.g queues) — Sami Korhonen
For the nightly we can do this, but for on demand it needs to execute as fast as possible. One of my thoughts was to sum the number of quartz jobs to be created and take the average execution time of a single thread, and then randomly schedule the quartz jobs accordingly, also accounting for the number of threads. This would help alleviate the misfiring slightly, but the execution time of a single thread is too variable, and always assuming the worst case scenario would take too long in the case of on demand processing. — Brett McLain
I think using Quartz for this kind of scenario is wrong, since all jobs you spawn should execute immediately and not at a specific time. Like other answers suggest using queues and an executor service would make the most sense here. — Leonard Brünings

yair yair · Accepted Answer · 2013-12-30T13:04:37

In both of these cases, around 10,000 quartz jobs get spawned

No need to spawn new quartz jobs. Quartz is a scheduler - not a task manager.

In the nightly reprocess - you need only that one quartz cron job to invoke some service responsible for managing and running the 10,000 tasks. In the "on demand" scenario, quartz shouldn't be involved at all. Just invoke that service directly.

How does the service manage 10,000 tasks?

Typically, when only one JVM is available, you'd just use some ExecutorService. Here, since you have 6 nodes under your fingers, you can easily use Hazelcast. Hazelcast is a java library that enables you to cluster your nodes, sharing resources efficiently with each other. Hazelcast has a straightforward solution distributing your ExecutorService, that's called Distributed Executor Service. It's as easy as creating a Hazelcast ExecutorService and submitting the task on all members. Here's an example from the documentation for invoking on a single member:

Callable<String> task = new Echo(input); // Echo is just some Callable
HazelcastInstance hz = Hazelcast.newHazelcastInstance();
IExecutorService executorService = hz.getExecutorService("default");
Future<String> future = executorService.submitToMember(task, member);
String echoResult = future.get();

How to gracefully handle thousands of Quartz misfires?

4 Answers