Multithreaded approach to process SQS item Queue

Question

In this scenerio, I have to Poll AWS SQS messages from a queue, each async request can fetch upto 10 sqs items/messages. Once I Poll the items, Then I have to process those items on a kubernetes pod. Item processing includes getting response from few API calls, it may take some time & then saving the item to DB & S3. I did some R&D & reach on following conclusion

To use consumer producer model, 1 thread will poll items & another thread will process the item or to use multi-threading for item processing
Maintain a data structure that will containes sqs polled items ready for processing, DS could be Blocking collection or Concurrent queue
Using Task Parellel Library for threadpooling & in item processing.
Channels can be used

My Queries

What would be best approach to achieve best performance or increase TPS.
Can/Should I use data flow TPL
Multi threaded or single threaded with asyn tasks

How many messages per second are you anticipating? You could do all kind of magic tricks but if the rate is, say 1 msg per second, you can make other choices. — Peter Bons
We want to achieve 500 TPS, I mentioned it incorrectly that processing through API call takes seconds, but it takes millisecond get result for an API call — panky sharma
Could you please share with some code that shows how do you imagine a single threaded processing? — Peter Csala

Aaron Zhong Aaron Zhong · Accepted Answer · 2021-08-31T09:16:57

I'm not familiar with Kubernetes but there are many things to consider when maximising throughput.

All the things which you have mentioned is IO bound not CPU bound. So, using TPL is overcomplicating the design for marginal benefit. See: https://docs.microsoft.com/en-us/dotnet/csharp/async#recognize-cpu-bound-and-io-bound-work

Your Kubernetes pods are likely to have network limitations. For example, with Azure Function Apps on Consumption Plans is limited to 1,200 outbound connections. Other services will have some defined limits, too. https://docs.microsoft.com/en-us/azure/azure-functions/manage-connections?tabs=csharp#connection-limit. Due to the nature of your work, it is likely that you will reach these limits before you need to process IO work on multiple threads.

You may also need to consider limits of the services which you are dependent on and ensure they are able to handle the throughput.

You may want to consider using Semaphores to limit the number of active connections to satisfy both your infrastructure and external dependency limits https://docs.microsoft.com/en-us/dotnet/api/system.threading.semaphoreslim?view=net-5.0

That being said, 500 messages per second is a realistic amount. To improve it further, you can look at having multiple processes with independent resource limitations processing the queue.

Multithreaded approach to process SQS item Queue

2 Answers