0
votes

There is a program which is implemented using producer and consumer pattern. The producer fetches data from db based on list of queries and puts it in array blocking queue... The consumer prepares excel report based on data in array blocking queue. For increasing performance, I want to have dynamic number of producers and consumers.. example, when producer is slow, have more number of producers.. when, consumer is slow, have more numbers of consumers . How can I have dynamic producers and consumers??

1
What have you tried and what are you having trouble with? You make the number of producers/consumers dynamic by creating more or stopping existing ones as needed in the same way you start/stop them now.Peter Lawrey

1 Answers

1
votes

If you do this, you must first ask yourself a couple of questions:

  • How will you make sure that multiple parallel producers put items in the queue in the correct order? This might or might not be possible - it depends on the kind of problem you are dealing with.
  • How will you make sure that multiple parallel consumers don't "steal" each other's items from the queue? Again, this depends on your problem, in some cases this might be desirable and in others it's forbidden. You didn't provide enough information, but typically if you prepare data for report, you will need to have a single consumer and wait until the report data is complete.
  • Is this actually going to achieve any speedup? Did you actually measure that the bottleneck is I/O bound on the producer side, or are you just assuming? If the bottleneck is CPU-bound, you will not achieve anything.

So, assuming that you need complete data for report (i.e. single consumer, which needs the full data), and that your data can be "sharded" to independent subsets, and that the bottleneck is in fact what you think it is, you could do it like this:

  1. As multiple producers will be producing different parts of results, they will not be sequential. So a list is not a good option; you would need a data structure where you would store interim results and care about which ranges have been completed and which ranges are still missing. Possibly, you could use one list per producer as a buffer and have a "merge" thread which will write to a single output list for consumer.
  2. You need to split input data to several input pieces (one per producer)
  3. You need to somehow track the ordering and ensure that the consumer takes out pieces in correct order
  4. You can start consumer at the moment the first output piece comes out
  5. You must stop the consumer when the last piece is produced.

In short, this is a kind of problem for which you should probably think about using something like MapReduce