2
votes

I have a C++ app in which I create pthreads to run user provided functions. I want to be able to be alerted in some way when a thread exits so that I can remove it from an array of pthread that I am using to keep the threads. Is there a way to do this, or should the function just set some "magic value". Because my main code that spawns the pthreads is in a sort of runloop, I can easily check for an exit condition.


Also, is using a std::vector<pthread_t> overdoing to keep track of my threads an overload? The number of threads is not necessarily any sort of constant, many threads or very few could be running. Or is there another STL container that would be good for these additions and deletions (additions always at one end, deletions almost anywhere). Is there some other structure for keeping track of pthreads? Would a stack or a list be right here? Or a standard C array with a generous maximum good? Due to the nature of the problem, I could also maintain a fixed size array of worker threads to whom I pass the user functions that must be executed. Is this a good solution?

Sorry for the long confused question, but I have only worked with threading in dynamic languages where this would never be an issue.


EDIT (3/08/12): After reading @jojojapan's answer, I have decided to use a threadpool of sorts. In my structure, I have one producer (a thread in a runloop) and many consumers (the worker threads in the pool). Is there a data structure that is made for multithreaded one-producer many-consumer use? Or whould I just use a std::queue with a pthread_mutex_t on it?

3
When you say "to run user provided functions", is it you who are writing the thread funcs?Duck
Dynamic languages ? You mean GC languages ;).J.N.
@Duck: No. I'm writing a library, and the programmer who uses it writes these functions.Linuxios
@J.N.: Yes, that is what I mean. Ruby and C#, to be specific.Linuxios

3 Answers

3
votes
  1. One option you might want to consider is to not actually end and delete threads once they finished a task, but instead keep them alive and have them wait for a new task to be assigned to them. You can accomplish this by doing two things:

    1. Use an (almost) infinite loop in the thread
    2. Use a concurrent queue or some other technique that makes them wait for a signal to be given by another thread. Design patterns and strategies are discussed in several SO questions, e.g. this one
  2. If you really want to send a signal once a thread ends, you can use a pthread_cond_t and call pthread_cond_signal on it just before a thread reaches its return statement. Of course that assumes that there is some other thread running that waits for these signals and acts upon them by removing the corresponding thread from the vector. Details on the usage are described on the corresponding man page, but also in this SO post.

Edit related to the comment and the edited part of the question:

  1. Regarding the number of worker threads: That depends on the resources used the most by the threads. If what those threads do is mostly computation and a bit of memory access, in other words, if they are CPU-bound, it makes sense to use as many threads as your CPU can maintain (specifically, there is a certain number of cores, and number of (hardware) threads per core that your CPU can run before they start slowing each other down. The threads you are creating (software threads) should be about as many, or perhaps a few more (up to two times as many as hardware threads is reasonable according to what @Tudor says here)). However, if your threads make heavy use of memory (memory-bound) or harddisk (IO-bound) or other resources such as the network, NFS, or some other server, you might want to reduce the number of threads in order (a) not to cause them to block each other, and (b) not to put unreasonably much load on certain resources. Determining the right number of threads may be a matter of experimenting, and keeping the number configurable is generally a good idea.

  2. Regarding the best data structure to store work tasks: The concurrent bounded queue mentioned in the comments of the post I cited further above is probably very good. I haven't tried it myself, though. But if you'd like to keep things simple, a standard std::queue, or even simply a std::vector would not be a bad choice, if you protect them properly using the signal/mutex technique.

1
votes

Consider changing strategy entirely and use an existing threadpool library. They'll do the job for you, you will save a lot of not so funny debugging.

Boost.thread pool is one of many, link.

1
votes

A simple way to do this is to just use a pipe.

Open the pipe before spawning the threads. Pass the pipe fd as part of your thread data. Before the thread exits have it write its pthread_self() to the pipe. Have main or separate thread on the read end of pipe. It reads the dead thread's tid and immediately does a pthread_join. (If it is a separate reaper thread it can just block on the pipe read; if it is in main just make it part of your select/poll or whatever.)

This gives you the flexibility of not using an data structure to save the TIDs at all if don't want. If you do want to save them then a list or a map is a better choice than a vector.

If you have main starting the threads and a separate 'reaper' thread collecting them and you want to save them in some structure then you will need to synchronizing access to the structure between the two.