3
votes

Current Setup

We have kubernetes cluster setup with 3 kubernetes pods which run spring boot application. We run a job every 12 hrs using spring boot scheduler to get some data and cache it.(there is queue setup but I will not go on those details as my query is for the setup before we get to queue)

Problem

Because we have 3 pods and scheduler is at application level , we make 3 calls for data set and each pod gets the response and pod which processes at caches it first becomes the master and other 2 pods replicate the data from that instance.

I see this as a problem because we will increase number of jobs for get more datasets , so this will multiply the number of calls made.

I am not from Devops side and have limited azure knowledge hence I need some help from community

Need

What are the options available to improve this? I want to separate out Cron schedule to run only once and not for each pod 1 - Can I keep cronjob at cluster level , i have read about it here https://kubernetes.io/docs/concepts/workloads/controllers/cron-jobs/ Will this solve a problem?

2 - I googled and found other option is to run a Cronjob which will schedule a job to completion, will that help and not sure what it really means.

Thanks in Advance to taking out time to read it.

1

1 Answers

1
votes

Based on my understanding of your problem, it looks like you have following two choices (at least) -

  1. If you continue to have scheduling logic within your springboot main app, then you may want to explore something like shedlock that helps make sure your scheduled job through app code executes only once via an external lock provider like MySQL, Redis, etc. when the app code is running on multiple nodes (or kubernetes pods in your case).
  2. If you can separate out the scheduler specific app code into its own executable process (i.e. that code can run in separate set of pods than your main application code pods), then you can levarage kubernetes cronjob to schedule kubernetes job that internally creates pods and runs your application logic. Benefit of this approach is that you can use native kubernetes cronjob parameters like concurrency and few others to ensure the job runs only once during scheduled time through single pod.

With approach (1), you get to couple your scheduler code with your main app and run them together in same pods.

With approach (2), you'd have to separate your code (that runs in scheduler) from overall application code, containerize it into its own image, and then configure kubernetes cronjob schedule with this new image referring official guide example and kubernetes cronjob best practices (authored by me but can find other examples).

Both approaches have their own merits and de-merits, so you can evaluate them to suit your needs best.