8
votes

I have an Azure Logic App which gets triggered when a new file is added or modified in an SFTP server. When that happens the file is copied to Azure Blob Storage and then gets deleted from the SFTP server. This operation takes approximately 2 seconds per file.

The only problem I have is that these files (on average 500kb) are processed one by one. Given that I'm looking to transfer around 30,000 files daily this approach becomes very slow (something around 18 hours).

Is there a way to scale out/parallelize these executions?

2
You mentioned: "The only problem I have is that these files (on average 500kb) are processed one by one." By default, a split-on is set on the SFTP trigger, so each file (if multiple ones are detected) will trigger a run instead of one run for all files. Are you not seeing this?Derek Li
@Derek Yes, each file triggers a separate execution but the executions are sequentialFlorin D. Preda
That doesn't sounds right. Split triggers should execute in parallel - can you check the "Diagnostics" tab and see if you're getting any "Run Throttled Events"? It could be that they are running in parallel, but because the actions are being throttled, it looks like they are running in sequence.Derek Li
@FlorinD.Preda have you had any issues with your Logic App being able to consistently connect to the SFTP server, where you would be getting 'skipped' triggers?aaronR
@aaronR Yes, I had but I believe it was the SFTP server rejecting the requests in my case. In any case, I ended up writing the transfer logic in C#Florin D. Preda

2 Answers

0
votes

I am not sure that there is a scale out/parallelize execution on Azure Logic App. But based on my experience, if the timeliness requirements are not very high, we could use Foreach to do that, ForEach parallelism limit is 50 and the default is 20.

In your case, my suggestion is that we could do loop to trigger when a new file is added or modified in an SFTP then we could insert a queue message with file path as content to azure storage queue, then according to time or queue length to end the loop. We could get the queue message collection. Finally, fetch the queue message and fetch the files from the SFTP to create blob in the foreach action.

0
votes

If you're C# use Parallel.ForEach like Tom Sun said. If you use this one I also recommend to use async/await pattern for IO operation (save to blob). It will free up the executing thread when file is being saved to serve some other request.