How memory is mapped to gpu (opencl Intel graphics)

Question

I'm using intel integrated gpu for my opencl implementation. I am implementing a program with zero copy where I'm not copying the data to the gpu instead it shares the common memory(RAM).

I have a 64bit cpu, but in gpu specs it shows it has only 32 bit addressing mode.

I'm sharing a malloc heap space between the gpu and cpu and when I print the address I see the following.

In GPU:

if(id==0){
        printf("Mem address: %p\n",A); 

//Outputs Mem address: 0x1010000

In CPU: it prints

printf("Outside Mem address: %p\n",cpuA);
Device: Intel(R) HD Graphics IvyBridge M GT2
Outside Mem address: 0x7fcd529d9000

I am not getting how is it mapped in gpu. And I wonder if 2^28/2^32 is the maximum address gpu could access?.

pmdj pmdj · Accepted Answer · 2019-06-26T12:02:42

The memory address you are printing on the host is a virtual address that only makes sense in the context of your program's process. In the CPU, this is transparently translated to a physical RAM page, the address of which is unrelated to the virtual address but stored in a lookup table (page table) maintained by the operating system. Note that "64-bit CPU" typically refers to the number of bits in a virtual address. (Although many 64-bit CPUs actually ignore 8-16 bits.) The number of bits for physical addresses (for addressing physical RAM cells and mapped device memory) is often a lot less, as little as 40 bits.

Devices attached to the system and able to perform direct memory accesses (DMA) most commonly deal with physical memory addresses. If your Intel GPU does not have an internal memory mapping scheme (and there is no IOMMU active, see below) then the address you are seeing in your OpenCL kernel code is probably a physical memory address. If the device can only address 32 bits, this means it can only access the first 4GiB of physical memory in your system. By assigning memory above 4GiB to devices and user space processes that aren't affected by a 32-bit restriction, or by using "bounce buffers", the operating system can arrange for any buffers used by the restricted device to be located in that memory area, regardless of virtual address.

Recently, IOMMUs have become common. These introduce a virtual memory like mapping system for devices as well - so the memory addresses a device sees are again unrelated to the physical addresses of system memory they correspond to. This is primarily a security feature - ideally, each device gets its own address space, so devices can't accidentally or deliberately access system memory they should not be accessing. It also means that the 32-bit limitation becomes completely irrelevant, as each device gets its own 32-bit address space which can be mapped to physical memory beyond the 4GiB boundary.

How memory is mapped to gpu (opencl Intel graphics)

1 Answers