0
votes

I need to provide a huge circular buffer (a few GB) for the bus-mastering DMA PCIe device implemented in FPGA.

The buffers should not be reserved at the boot time. Therefore, the buffer may be not contiguous.

The device supports scatter-gather (SG) operation, but for performance reasons, the addresses and lengths of consecutive contiguous segments of the buffer are stored inside the FPGA. Therefore, usage of standard 4KB pages is not acceptable (there would be up to 262144 segments for each 1GB of the buffer).

The right solution should allocate the buffer consisting of 2MB hugepages in the user space (reducing the maximum number of segments by factor of 512). The virtual address of the buffer should be transferred to the kernel driver via ioctl. Then the addresses and the length of the segments should be calculated and written to the FPGA.

In theory, I could use get_user_pages to create the list of the pages, and then call sg_alloc_table_from_pages to obtain the SG list suitable to program the DMA engine in FPGA. Unfortunately, in this approach I must prepare the intermediate list of page structures with length of 262144 pages per 1GB of the buffer. This list is stored in RAM, not in the FPGA, so it is less problematic, but anyway it would be good to avoid it.

In fact I don't need to keep the pages maped for the kernel, as the hugepages are protected against swapping out, and they are mapped for the user space application that will process the received data.

So what I'm looking for is a function sg_alloc_table_from_user_hugepages, that could take such a user-space address of the hugepages-based memory buffer, and transfer it directly into the right scatterlist, without performing unnecessary and memory-consuming mapping for the kernel. Of course such a function should verify that the buffer indeed consists of hugepages.

I have found and read these posts: (A), (B), but couldn't find a good answer. Is there any official method to do it in the current Linux kernel?

1

1 Answers

0
votes

At the moment I have a very inefficient solution based on get_user_pages_fast:

   int sgt_prepare(const char __user *buf, size_t count, 
               struct sg_table * sgt, struct page *** a_pages,
               int * a_n_pages)
   {
       int res = 0;
       int n_pages;
       struct page ** pages = NULL;
       const unsigned long offset = ((unsigned long)buf) & (PAGE_SIZE-1);
       //Calculate number of pages
       n_pages = (offset + count + PAGE_SIZE - 1) >> PAGE_SHIFT;
       printk(KERN_ALERT "n_pages: %d",n_pages);
       //Allocate the table for pages
       pages = vzalloc(sizeof(* pages) * n_pages);
       printk(KERN_ALERT "pages: %p",pages);
       if(pages == NULL) {
           res = -ENOMEM;
           goto sglm_err1;
       }
       //Now pin the pages
       res = get_user_pages_fast(((unsigned long)buf & PAGE_MASK), n_pages, 0, pages); 
       printk(KERN_ALERT "gupf: %d",res);   
       if(res < n_pages) {
           int i;
           for(i=0; i<res; i++)
               put_page(pages[i]);
           res = -ENOMEM;
           goto sglm_err1;
       }
       //Now create the sg-list
       res = sg_alloc_table_from_pages(sgt, pages, n_pages, offset, count, GFP_KERNEL);
       printk(KERN_ALERT "satf: %d",res);   
       if(res < 0)
           goto sglm_err2;
       *a_pages = pages;
       *a_n_pages = n_pages;
       return res;
   sglm_err2:
       //Here we jump if we know that the pages are pinned
       {
           int i;
           for(i=0; i<n_pages; i++)
               put_page(pages[i]);
       }
   sglm_err1:
       if(sgt) sg_free_table(sgt);
       if(pages) kfree(pages);
       * a_pages = NULL;
       * a_n_pages = 0;
       return res;
   }
   
   void sgt_destroy(struct sg_table * sgt, struct page ** pages, int n_pages)
   {
       int i;
       //Free the sg list
       if(sgt->sgl)
           sg_free_table(sgt);
       //Unpin pages
       for(i=0; i < n_pages; i++) {
           set_page_dirty(pages[i]);
           put_page(pages[i]);
       }
   }

The sgt_prepare function builds the sg_table sgt structure that i can use to create the DMA mapping. I have verified that it contains the number of entries equal to the number of hugepages used.

Unfortunately, it requires that the list of the pages is created (allocated and returned via the a_pages pointer argument), and kept as long as the buffer is used.

Therefore, I really dislike that solution. Now I have 256 2MB hugepages used as a DMA buffer. It means that I have to create and keeep unnecessary 128*1024 page structures. I also waste 512 MB of kernel address space for unnecessary kernel mapping.

The interesting question is if the a_pages may be kept only temporarily (until the sg-list is created)? In theory it should be possible, as the pages are still locked...