1
votes

I have the following scenario:

  • for jobs that need to be processed, a message containing the job description is sent to an Amazon SQS message queue
  • i have different processes that have to do the jobs

The following conditions need to be satisfied:

  1. If one process fails to complete a job (maybe because the server it is running on crashes), the job must be available again to the other processes.
  2. While one job is being worked on, other jobs with the same description have to wait until the first job is finished or reached its timeout.
  3. The system should be easily scalable according to the message queue length.

To ensure (1), my first idea was to use the message locking functionality provided by the Amazon SQS queue, but how do I ensure (2) then? Assign processes to a job description would be an option, but then (3) would be more difficult.

1
What do you mean when you say, "other jobs with the same description"? We might need more detail here to help. For example if we are doing image resizing there would be no point to limiting a system to only do one resize operation at a time.Jeff
I thought it would be clearer to formulate the question rather abstract, but here are some details: It's about computing patterns for a timeseries (a set of historical data). Jobs with the "same description" means they have to compute patterns for the same timeseries. Whenever historical data of a timeseries is updated, a pattern recomputation is triggered by sending a corresponding message to the queue.zero-divisor

1 Answers

1
votes

Set a longer "VisibilityTimeout" value. Make sure that this value is longer than it typically takes to complete the job.

If a machine that receives this job fails to complete it (or fails to complete it in a timely manner), the message becomes available again for a new machine to handle.

This addresses #1 and #2.

For #3, I believe that you can set up auto scaling triggers based on the size of an SQS queue, so if you have more messages, AWS will automatically spin-up new instances to handle them.