3
votes

I am trying to access the DMA address in a NIC directly from another PCIe device in Linux. Specifically, I am trying to read that from an NVIDIA GPU to bypass the CPU all together. I have researched for zero-copy networking and DMA to userspace posts, but they either didn't answer the question or involve some copy from Kernel space to User space. I am trying to avoid using any CPU clocks because of the inconsistency with the delay and I have very tight latency requirements.

I got a hold of the NIC driver for the intel card I use (e1000e driver) and I found where the ring buffers are allocated. As I understood from a previous paper I was reading, I would be interested in the descriptor of type dma_addr_t. They also have a member of the rx_ring struct called dma. I pass both the desc and the dma members using an ioctl call but I am unable to get anything in the GPU besides zeros.

The GPU code is as follows:

int *setup_gpu_dma(u64 addr)                                                     
{                                                                                
    // Allocate GPU memory                                                       
    int *gpu_ptr;                                                                
    cudaMalloc((void **) &gpu_ptr, MEM_SIZE);                                    

    // Allocate memory in user space to read the stuff back                      
    int *h_data;                                                                 
    cudaMallocHost((void **)&h_data, MEM_SIZE);                                  

    // Present FPGA memory to CUDA as CPU locked pages                           
    int error = cudaHostRegister((void **) &addr, MEM_SIZE,                      
        CU_MEMHOSTALLOC_DEVICEMAP);                                              
    cout << "Allocation error = " << error << endl;                              

    // DMA from GPU memory to FPGA memory                                        
    cudaMemcpy((void **) &gpu_ptr, (void **)&addr,   MEM_SIZE, cudaMemcpyHostToDevice);
    cudaMemcpy((void **) &h_data, (void **)&gpu_ptr, MEM_SIZE, cudaMemcpyDeviceToHost);

    // Print the data                                                            

    // Clean up 
}                        

What am I doing wrong?

1
Where it says "FPGA" in the comments, should it say "NIC buffer"?Roger Dahl
yes it should. I copied part of this code from another example doing something very similar in Windows. They were reading from an FPGA PCI-e board. Sorry about that.jrk0414
Have you had a chance to look at NVIDIA's GPUDirect documentation: docs.nvidia.com/cuda/gpudirect-rdma/index.htmlnjuffa
Yes I have, but there are a couple issues with it. First, that does not work with Geforce GPUs. Instead of giving the GPU address to the NIC, we would have to give the NIC address to the GPU. Second, the NIC drivers use ring buffers, which make it hard to use with GPU memory as I understand.jrk0414

1 Answers

1
votes

cudaHostRegister() operates on already-allocated host memory, so you have to pass addr, not &addr.

If addr is not a host pointer, this will not work. If it is a host pointer, your function interface should use void * and then there will be no need for the typecast.