0
votes

I am writing a workflow in AWS SWF and at a point it has the following steps:

DownloadFromS3 -> doSomeProcessing -> UploadResults

My idea is to write each each step as a different task and let the decider schedule each one. The problem is, how can i guarantee that the worker that receives the doSomeProcessing is the same one that downloaded the file? I am running a pool of around 20 workers.

PS1: I know I can create a different task list for every worker and route the tasks to them individually, but this seems like a hack to me and not a proper solution.

PS2: There's an example in the SWF console that has a download and upload task, but it's written in Java (which I cannot understand) and it seems that it's written with a single worker in mind.

PS3: I'm currently using a server written in Go that execute all 3 steps and manage the states in between. However, it would be nice to offload the state management to a decider in SWF because doSomeProcessing is not a trivial task (engineering CFD simulations) and a lot of things can go wrong.

Thanks

1
Side-note: The modern alternative to Amazon Simple Workflow is Amazon Step Functions, which provides a state machine for AWS Lambda functions. Might be worth exploring.John Rotenstein
Isn't the whole intention that each worker is independent and is simply assigned a task to do? You shouldn't expect that a particular worker gets a particular job. If you need such a relationship, then just have one worker do all linked tasks, rather than being called separately.John Rotenstein

1 Answers

2
votes

You're right in your assumptions, you have basically two ways to handle that: - either have all 3 steps in the same SWF activity task ; this it what I do at work for the case you describe, because we consider that downloading from/uploading to S3 are trivial things that "just work" - either split steps in 3 different activity tasks ; then the only way you can guarantee the same node will be used is by changing the task list for the 2nd and the 3rd task ; we also do this for some very long task, and it works quite well.

The second option is actually not a hack, and you don't create any object, it's just a routing mechanism. The only downside in my opinion is that you have no native way to check if the backlog is too important on a given task list when they're dynamic like that, since you don't have a native way for listing those task lists. This can be handled with a wrapping system though, or you can rely on timeouts to alert yourself when a node cannot keep up.