I have the following scenario:
- for jobs that need to be processed, a message containing the job description is sent to an Amazon SQS message queue
- i have different processes that have to do the jobs
The following conditions need to be satisfied:
- If one process fails to complete a job (maybe because the server it is running on crashes), the job must be available again to the other processes.
- While one job is being worked on, other jobs with the same description have to wait until the first job is finished or reached its timeout.
- The system should be easily scalable according to the message queue length.
To ensure (1), my first idea was to use the message locking functionality provided by the Amazon SQS queue, but how do I ensure (2) then? Assign processes to a job description would be an option, but then (3) would be more difficult.