I'm trying to get a better understanding of the whole concept of parallel processing and have set up test cases. After playing with the tests, I see that using Async method calls within Dataflow ActionBlock
(or TransformBlock
) does not positively affect performance, it just complicates the code. Am I right in assuming that if I'm using Dataflow Blocks, code within them does not have to be asynchronous, Dataflow will make it asynchronous by itself. Or am I missing the point?
2 Answers
TPL Dataflow doesn't enable concurrency or parallelization (although you can get things like ActionBlock
to parallelize its processing) it's something concurrent or parallel code uses to communicate data. Among other things, it's a mechanism for message passing--which is kind of an alternative to shared data. Shared data, when used by multiple threads requires expensive synchronization. Message-passing, when done right, doesn't need synchronization because the data that needs to be worked on is encapsulated in a message that is "sent" to the code that will work on it.
TPL Dataflow is something you can use if you have a specific design. If you are not specifically implementing something like Actor-based programming, or message-passing, or non-blocking producer/consumer scenarios, then TPL Dataflow will likely complicate things.
If you think that you might want to design a system like this, there's some good resources on understanding TPL Dataflow (TDF) like a video by Stephen Toub (a member of the parallel team at Microsoft), as well as the Dataflow MSDN page.
UPDATE:
You can set the maximum degree of parallelism for a block, but setting it higher than the number of CPUs or Cores is often counter-productive. The assumption is that each action that executes is more-or-less CPU bound (using the CPU at 100% while it's running). If the actions spend a lot of time waiting (waiting on a wait handle, waiting on messages in a message pump--which isn't normal for an action but is for a UI threa--etc.) then a degree of parallelism beyond the number of CPUs might make sense (although that would be hard to tweak). When you do have more actions running than CPUs that are CPU bound you really start to task the OS. The OS wants to give up CPU time to each thread (or each action in this case) because it's "running". When there aren't enough CPUs to go around, the OS starts to preemptive multitask, round-robin giving CPU time to each active thread. Each time the OS takes the CPU away from one thread and gives it to anther is called context switching. This is really expensive (in the range of 2000-8000 cpu cycles). So, the OS is really spending all its time context switching and not running your actions.
If your actions are really asynchronous then the degree of parallelism of the block is irrelevant because something else is doing the parallelising. But, same issue arises, your asynchronous actions are being executed unchecked and you risk overwhelming the OS in the degree of context switching you introduce. I would seriously consider not doing asynchronous actions because of this lack of control.
TPL Dataflow is not suitable for all kinds of parallel processing and it also won't make your code magically faster.
The main idea behind TDF is that you have blocks, which do their work independently. What this means is that the work for each block can be performed on a separate thread, so parallelizing your code using TDF can be very simple in some cases.
This can be especially useful if the code inside a block uses some resource that can't be shared. This way, you can get great utilization of that shared resource, because processing of this block is independent of other blocks.
In general, TDF is most suitable if your code is like a pipeline: an item comes in, gets processed by phase 1, then by phase 2, …, and finally it comes to the output. Although dataflow networks can be much more complicated than that. But you shouldn't try to force this, if what you want to do doesn't fit well with TDF, you're right that you will just complicate your code for no benefit.