5
votes

I'm writing a server process that performs calculations on a GPU using cuda. I want to queue up in-coming requests until enough memory is available on the device to run the job, but I'm having a hard time figuring out how much memory I can allocate on the the device. I have a pretty good estimate of how much memory a job requires, (at least how much will be allocated from cudaMalloc()), but I get device out of memory long before I've allocated the total amount of global memory available.

Is there some king of formula to compute from the total global memory the amount I can allocated? I can play with it until I get an estimate that works empirically, but I'm concerned my customers will deploy different cards at some point and my jerry-rigged numbers won't work very well.

1
Interesting. It was my impression that you could allocate the entire global memory space, maybe less a small amount. Is your graphics card being used by anything else in the system? If you're using CUDA 4.0, you might be able to check using the cuda tools SDK or an already-built tool... I'll test my system right now and see whether I have the same problem.Patrick87
I should probably have mentioned that I'm using cufft, which I can't track directly, but the docs say it can take upto 3x the FFT size in memory. It doesn't seem like nearly enough to account for the discrepancy.John Gordon
What GPU are you using? How much global memory does it have? Check with the deviceQuery tool in the SDK. I just played around a little and it looks like I can easily allocate 1208/1280 Megabytes of global memory on ym GTX470, with a single call to cudaMalloc no less. There's a strong possibility that CUFFT is responsible; otherwise, there may be a memory leak in your program. Are you calling cudaFree like you should? Could be a memory leak in a lib you're using.Patrick87
It's a Quadro 600 with 1G of memory. I have my cudaMallocs and frees wrapped in a pointer class. I added tracking and print statement and I can see that I don't have leaks. Also the server can run a long time with small jobs without any problems, so a leak is unlikely.John Gordon
And how much memory can you allocate before it craps out? It seems like a worst-case estimate using CUFFT and memory size N is going to be something like n = (T - k) / 3, where T is the total advertised global memory, n is the amount you use with CUFFT, and k is a small overhead amount. Taking your k to be 5/6 mine = 60MB and T = 1024 MB, you're looking at an n of around 321 Megabytes. How much are you actually getting?Patrick87

1 Answers

5
votes

The size of your GPU's DRAM is an upper bound on the amount of memory you can allocate through cudaMalloc, but there's no guarantee that the CUDA runtime can satisfy a request for all of it in a single large allocation, or even a series of small allocations.

The constraints of memory allocation vary depending on the details of the underlying driver model of the operating system. For example, if the GPU in question is the primary display device, then it's possible that the OS has also reserved some portion of the GPU's memory for graphics. Other implicit state the runtime uses (such as the heap) also consumes memory resources. It's also possible that the memory has become fragmented and no contiguous block large enough to satisfy the request exists.

The CUDART API function cudaMemGetInfo reports the free and total amount of memory available. As far as I know, there's no similar API call which can report the size of the largest satisfiable allocation request.