0
votes

We have a web application that receives incoming data via RESTful web services running on Jersey/Tomcat/Apache/PostgreSQL. Separately from this web-service application, we have a number of repeating and scheduled tasks that need to be carried out. For example, purging different types of data at different intervals, pulling data from external systems on varying schedules, and generating reports on specified days and times.

So, after reading up on Quartz Scheduler, I see that it seems like a great fit.

My question is: should I design my Quartz-based scheduling application to run in Tomcat (via QuartzInitializerListener), or build it into a standalone application to run as a linux daemon (e.g., via Apache Commons Daemon or the Tanuk Java Service Wrapper).

On the one hand, it strikes me as counterintuitive to use Tomcat to host an application that is not geared towards receiving http calls. On the other hand, I haven't used Apache Commons Daemon or the Java Service Wrapper before, so maybe running inside Tomcat is the path of least resistance.

Are there any significant benefits or dangers with either approach that I should be aware of? Our core modules already take care of data access, logging, etc., so I don't see that those services are much of a factor either way.

Our scheduling will be data driven, so our Quartz-based scheduler will read the relevant data from PostgreSQL. However, if we run the scheduling application within Tomcat, is it possible/reasonable to send messages to our application via http calls to Tomcat? Finally, fwiw, since our jobs will be driven by our existing application data, I don't see any need for the Quartz JDBCJobStore.

1

1 Answers

1
votes

To run a Java standalone application as linux daemon, simply end the java-command with an & -sign so that it runs in the background and put it in an Upstart-script for example.

As for the design: in this case I would go for whatever is easier to maintain. And it looks like running an app in Tomcat is already familiar. One benefit that comes to mind is that configuration files (for the database for example) can be shared/re-used so that only one set of configuration files needs to be maintained.

However, if you think the scheduled tasks can have a significant impact on resource usage, then you might want to run the tasks on a separate (virtual) machine. Since the timing of the tasks is data driven, it is hard to predict the exact load. E.g. it could happen that all the different tasks are executed at the same time (worst case/highest load scenario). Also consider the complexity of the software for the scheduled tasks and the related risk of nasty bugs: if you think there is a low chance of nasty bugs, then running the tasks in Tomcat next to the web-service is a good option, if not, run the tasks as a separate application. Lastly, consider the infrastructure in general: production line systems (providing (a continuous flow of) data processing critical to business) should be separate from non-production line systems. E.g. if the reports are created an hour later than usual and the business is largely unaffected, then this is non-production line. But if the web-service goes down and business is (immediatly) affected, then this is production line. Purging data and pulling updates is a bit gray: depends on what happens if these tasks are not performed, or later.