My program analyzes a video file, which is represented as a 3d array and sent from LabView to my program. LabView already flattens this 3d array into a 1d array, so I have just been allocating a 1d array in CUDA, using cudaMalloc, and using cudaMemcpy to copy the data over. However i noticed if I am sending more than 2XXX, 120x240 pixle images, that I am getting an "unknown error" from some of my cuda memory functions (cudamemcpy and cudafree, which occur later in my program after a few kernels are called) and these ultimately break my program. However, if I lower the number if images i am sending I don't have a problem. This leads me to believe that my code is fine, but my memory allocation practices are bad.
To start, lets talk about Pitched memory. As far as I am aware, this is is all about picking a good size to allocate memory such that linear data is not split over two chunks. This is especially common for 2d and 3d arrays since you would want to keep rows or columns together in memory for fast access.
Could these kinds of problems occur if I don't use pitched memory? What kinds of errors can occur when not using pitched memory, especially for these very large arrays? I up to this point have ignored the option of using cudaMallocPitch and cudaMalloc3d although I do technically have 2d and 3d arrays, which I have flattened.
Finally, how can I further debug problems with my code when cudaGetLastError only tells me "unknown error"? I am able to find which function is at fault, but when it is somthign like cudaFree, there is no way for me to debug this kind of stuff, or find out where the problem is originating.
Anyway, thanks for the help.