I have a file that's too big to be read all at once. It must be processed in chunks.
I'm envisioning a background load system that works on two buffers: One for processing, one for reading into (and swapping them all the time).
In Pseudo-Code:
Read Buf1
Mark Buf2 as dirty, so background task will fill it with new data
Process Buf1
if(Reached End of Buf1)
Block if Buf2 still marked as dirty
Swap Buf1 <-> Buf2
Mark Buf2 as dirty
Process Buf1
... and so on.
There are going to be many chunks. Would it be better to put a dedicated reading thread on this or is it "ok" to launch an std::async for every read operation? Because I'm told that these launch their own threads internally, which is expensive.
Yes, it is time critical.
std::async
the thread use for it might be reused from a thread pool. If you usestd::thread
you have full control, but you would need to pause your thread and wait for new data, and if you do that the wrong way you might have worse performance, then spawning a new thread for each chunk. – t.niesestd::async
implementation uses a thread pool, so it's not bad. It's a good start. But later on I would consider making a global IO worker pool to have control over how many threads are spawned (possibly one pool per physical drive), it's not unusual to have more IO worker threads than total vcores on the machine, andstd::async
would never do that. – Sopel