this is my first post here, thank you for your tolerance in advance.
I have a thread pool with job queue >> number of threads.
Process flow:
- Init thread pool (M number of threads)
- Put N number of tasks in queue ( N can be >> M)
- Threads start to execute tasks; after finishing current task thread takes next available task automatically.
- Synchronization point - all tasks have to be finished.
- Data processing (single thread)
- Generate tasks based on processed data OR quit
- Goto 2
The problem is the synchronization point. I've implemented simple semaphore using counter and mutex : before step 2 counter is initialized with number of tasks to be loaded and when each task is complete counter decrements. If counter == zero then I send pthread_cond_signal from the worker thread, and pthread_cond_wait at Step 4 catches it.
I feel like its not most efficient way to do this (i dont like lock/unlock in each thread for counter decrement, its a big overhead especially if task payload is small), but cant get idea on how to improve. I am aware of barriers but I cant pthread_barrier_wait in threads because they have to be reused multiple times before sync event occurs.
pthread_spin_lock on number of tasks in queue ? Even if queue is empty it doesn't mean that threads aren't running - they may be on last M tasks. I can't join threads because they are to be reused in next cycle.
I would appreciate any input/ideas. Thank you.