4
votes

I am trying to figure out how to set up a work queue with Argo. The Argo Workflows are computationally expensive. We need to plan for many simultaneous requests. The workflow items are added to the work queue via HTTP requests.

The flow can be demonstrated like this:

client  
  => hasura # user authentication  
    => redis # work queue
      => argo events # queue listener
        => argo workflows 
          => redis + hasura # inform that workflow has finished
            => client 

I have never build a K8s cluster that exceeds its resources. Where do I limit the execution of workflows? Or does Argo Events and Workflows limit these according to the resources in the cluster?

The above example could probably be simplified to the following, but the problem is what happens if the processing queue is full?

client
  => argo events # HTTP request listener
    => argo workflows
1

1 Answers

2
votes

Argo Workflows has no concept of a queue, so it has no way of knowing when the queue is full. If you need queue control, that should happen before submitting workflows.

Once the workflows are submitted, there are a number of ways to limit resource usage.

  1. Pod resources - each Workflow step is represented by a Kubernetes Pod. You can set resource requests and limits just like you would with a Pod in a Deployment.
  2. Step parallelism limit - within a Workflow, you can limit the number of steps running concurrently. This can help when a step is particularly resource-intensive.
  3. Workflow parallelism limit - you can limit the number of workflows running concurrently by configuring them to us a semaphore.

There are a number of other performance optimizations like setting Workflow and Pod TTLs and offloading YAML for large Workflows to a DB instead of keeping them on the cluster.

As far as I know, there is no way to set a Workflow limit so that Argo will reject additional Workflow submissions until more resources are available. This is a problem if you're worried about Kubernetes etcd filling up with too many Workflow definitions.

To keep from blowing up etcd, you'll need another app of some kind sitting in from of Argo to queue Workflows submissions until more resources become available.