I am using Intel x520 and x540 dual port NIC attached to Dell PowerEdge server. All NIC ports can work at 10Gbps, hence total 40 Gbps. The system has 2 sockets containing Xeon E5-2640 v3 CPU(Haswell Microarchitecture).
There are many problems I am facing and can be resolved using PCIe and DMA benchmarking. However, I couldn't find any proper way to do the same. I am unable to achieve 40Gbps throughput even with DPDK based driver and libraries(with 64 byte packets). I need to perform the experiments with 64 byte size and can't change the packet size.
I am generating packets using DPDK-pktgen and counting the events using Intel-PCM, ./pcm-pci.x. However, the counting is one way, in the sense, I am counting the number of events and can't tell what is the maximum number of each events the system can support. The results from pcm-pci.x :
Skt PCIeRdCur RFO CRd DRd ItoM PRd WiL
0 73 M 3222 K 784 K 63 M 52 M 0 2791 K
My NICs are connected to socket 0 and that's why I am not putting socket 1 results.
Is there any way to benchmark PCIe bus and DMA engine? and Is there any way to get the precise latency at IO subsystem(at each level) for packet processing(can't use rdtsc() to measure harware level latencies)?