2
votes

I have an application on Elixir that should receive a large amount of data, then distribute the data to n parts.

These parts must be processed in parallel, but the number of simultaneous employees should be limited. The employee returns a large array of values as a result of the processing.

The main process, having received the result from all workers, glues all in one file.

Is it a good idea to make employees through Task? will there be problems with the fact that the process of the employee must return a large amount of data?

or perhaps it is better to make a pool of employees with GenServer, and make synchronous calls?

1
Read your data from a file using File.stream then send it via a pipe to Stream.chunk_by and in its function dispatch the data-chunk to whatever processor you have setup ahead of time (via Node.spawn) using elixir's send mechanism. - GavinBrelstaff
each call send will not wait for the return of the result before sending it yet? - Marsel.V
No, you send it to the data to the process then it works in parallel then it can send on its result to another node that collects the results for you. Meanwhile you continue with the next chunk in the incoming File.stream - GavinBrelstaff
Your problem seems like perfect use case for Flow. - Hauleth
My way is try to record all the logs Elixir can produce, even request log. - YongHao Hu

1 Answers

1
votes

Task.async_stream provides a simple API to split work up and then gather the results, with a limit on the concurrency:

Example from the documentation:

max_concurrency = System.schedulers_online * 2
stream = Task.async_stream(collection, Mod, :expensive_fun, [], max_concurrency: max_concurrency)
Enum.to_list(stream)