0
votes

this is my first post here, thank you for your tolerance in advance.

I have a thread pool with job queue >> number of threads.

Process flow:

  1. Init thread pool (M number of threads)
  2. Put N number of tasks in queue ( N can be >> M)
  3. Threads start to execute tasks; after finishing current task thread takes next available task automatically.
  4. Synchronization point - all tasks have to be finished.
  5. Data processing (single thread)
  6. Generate tasks based on processed data OR quit
  7. Goto 2

The problem is the synchronization point. I've implemented simple semaphore using counter and mutex : before step 2 counter is initialized with number of tasks to be loaded and when each task is complete counter decrements. If counter == zero then I send pthread_cond_signal from the worker thread, and pthread_cond_wait at Step 4 catches it.

I feel like its not most efficient way to do this (i dont like lock/unlock in each thread for counter decrement, its a big overhead especially if task payload is small), but cant get idea on how to improve. I am aware of barriers but I cant pthread_barrier_wait in threads because they have to be reused multiple times before sync event occurs.

pthread_spin_lock on number of tasks in queue ? Even if queue is empty it doesn't mean that threads aren't running - they may be on last M tasks. I can't join threads because they are to be reused in next cycle.

I would appreciate any input/ideas. Thank you.

1

1 Answers

0
votes

Well, you could optimize it a bit, perhaps, by using an atomic decrement instruction instead of a kernel lock on the counter - the thread that decrements the counter to zero can then call some 'OnComplete(something)' method/function that could, perhaps, signal the originating thread, (much as you are doing now).

It's just not worth threading off trivial tasks, no matter how you design your queue and barrier/rendezvous thingy. There will usually be two context-changes anyway as your originating thread enters the wait and when it runs again after the tasks are completed, more if the pool is not busy and the pool threads are blocked on the queue. Spinning on the completion count will suck one core out of your pool of CPU - not likely to help much, (especially if you design your pool to run more than one set of tasks in parallel - there would be more than one originating thread spinning).

Don't thread off CPU-trivial-only tasks. If there is a lot of CPU work to be done, or the tasks perform blocking operations and there are many more threads than cores, fine.