There are two different "transfer rates" involved. In a well-designed system, the DMA controller must be able to interface with the address and data bus(es) at their normal operating rate. On the other hand, the time that it takes between operations may be much slower than the CPU instruction cycle, which means that it does not transfer data from a source address to a destination address at the same pace that the CPU would. Since almost all hardware devices attached to a system operate at a much slower pace, this is completely acceptable.
The typical purpose of DMA is to offload the CPU from the mundane task of shoveling bytes from memory to I/O ports. Consider the normal I/O sequence in the middle of a transmission:
- get an interrupt from the port that it is ready for the next byte or word;
- perform interrupt handling, including stack operations and saving registers;
- pick up the pointers and counter from memory;
- load a data byte, store a data byte;
- increment both pointers and save them
- decrement the counter and save it; if zero, flag end of transmission;
- return-from-interrupt handling
With DMA in the system, the cpu spends a little more time up front programming the DMA controller, but then avoids all of the interrupts until the end of transmission. Of course, when the DMA accesses memory the CPU cannot; but typically the CPU is not accessing memory every instruction anyway (add, subtract, whatever, all takes place inside the CPU with no memory access). On average, then, each byte transfer should cost less than one memory cycle (allowing for the ones that don't interfere), rather than a full interrupt-handling operation.