0
votes

According to Wikipedia, there are three kinds of DMA modes, namely, the Burst Mode, the cycle stealing mode and the transparent mode. In the Burst Mode, the dma controller will take over the control of the bus. Before the transfer completes, CPU tasks that need the bus will be suspended. However, in each instruction cycle, the fetch cycle has to reference the main memory. Therefore, during the transfer, the CPU will be idle doing no work, which is essentially the same as being occupied by the transferring work, under interrupt-driven IO. In my understanding, the cycle stealing mode is essentially the same. The only difference is that in those mode the CPU uses one in two consecutive cycles, as opposed to being totally idle in the bust mode. Does burst mode DMA make a difference by skipping the fetch and decoding cycles needed when using interrupt-driven I/O and thus accomplish one transfer per clock cycle instead of one instruction cycle and thus speed the process up?

Thanks a lot!

2

2 Answers

2
votes

how does burst-mode DMA speed up data transfer between main memory and I/O devices?

There is no "speed up" as you allege, nor is any "speed up" typically necessary/possible. The data transfer is not going to occur any faster than the slower of the source or destination.

The DMA controller will consolidate several individual memory requests into occasional burst requests, so the benefit of burst mode is reduced memory contention due to a reduction in the number of memory arbitrations.

Burst mode combined with a wide memory word improves memory bandwidth utilization. For example, with a 32-bit wide memory, four sequential byte reads consolidated into a single burst could result in only one memory access cycle.

Before the transfer completes, CPU tasks that need the bus will be suspended.

The concept of "task" does not exist at this level of operations. There is no "suspension" of anything. At most the CPU has to wait (i.e. insertion of wait states) to gain access to memory.

However, in each instruction cycle, the fetch cycle has to reference the main memory.

Not true. A hit in the instruction cache will make a memory access unnecessary.

Therefore, during the transfer, the CPU will be idle doing no work, which is essentially the same as being occupied by the transferring work, under interrupt-driven IO.

Faulty assumption for every cache hit.
Apparently you are misusing the term "interrupt-driven IO" to really mean programmed I/O using interrupts.
Equating a wait cycle or two to the execution of numerous instructions of an interrupt service routine for programmed I/O is a ridiculous exaggeration.

And "interrupt-driven IO" (in its proper meaning) does not exclude the use of DMA.

In my understanding, the cycle stealing mode is essentially the same.

Then your understanding is incorrect.

If the benefits of DMA are so minuscule or nonexistent as you allege, then how do you explain the existence of DMA controllers, and the preference of using DMA over programmed I/O?

Does burst mode DMA make a difference by skipping the fetch and decoding cycles needed when using interrupt-driven I/O and thus accomplish one transfer per clock cycle instead of one instruction cycle and thus speed the process

Comparing DMA to "interrupt-driven I/O" is illogical. See this.

Programmed I/O using interrupts requires a lot more than just the one instruction that you allege.
I'm unfamiliar with any CPU that can read a device port, write that value to main memory, bump the write pointer, and check if the block transfer is complete all with just a single instruction.
And you're completely ignoring the ISR code (e.g. save and then restore processor state) that is required to be executed for each interrupt (that the device would issue for requesting data).

1
votes

When used with many older or simpler CPUs, burst mode DMA can speed up data transfer in cases where a peripheral is able to accept data at a rate faster than the CPU itself could supply it. On a typical ARM, for example, a loop like:

lp:
  ldr  r0,[r1,r2]  ; r1 points to address *after* end of buffer
  strb r0,[r3]
  lsr  r0,r0,#8
  strb r0,[r3]
  lsr  r0,r0,#8
  strb r0,[r3]
  lsr  r0,r0,#8
  strb r0,[r3]
  adds r2,#4
  bne  lp

would likely take at least 11 cycles for each group of four bytes to transfer (including five 32-bit instruction fetches, one 32-bit data fetch, four 8-bit writes, plus a wasted fetch for the instruction following the loop). A burst-mode DMA operation, by contrast, DMA would only need 5 cycles per group (assuming the receiving device was able to accept data that fast).

Because a typical low-end ARM will only use the bus about every other cycle when running most kinds of code, a DMA controller that grabs the bus on every other cycle could allow the CPU to run at almost normal speed while the DMA controller performed one access every other cycle. On some platforms, it may be possible to have a DMA controller perform transfers on every cycle where the CPU isn't doing anything, while giving the CPU priority on cycles where it needs the bus. DMA performance would be highly variable in such a mode (no data would get transferred while running code that needs the bus on every cycle) but DMA operations would have no impact on CPU performance.