0
votes

I'm trying to implement a code previously written in CUDA using OpenCL to run on Altera FPGA. I'm having problem reading back data that are supposed to be in the buffer. I use the same structure as CUDA version, only thing different is cudaMalloc can allocate memory for all types of pointer while for clCreateBuffer I have to use cl_mem. My code looks like this:

cl_mem d_buffer=clCreateBuffer(...);
 //CUDA version:
 //float* d_buffer;
 //cudaMalloc((void **)&d_buffer, MemSz);

clEnqueueWriteBuffer(queue, d_buffer, ..., h_data, );
 //cudaMemcpy(d_buffer, h_Data, MemSz, cudaMemcpyHostToDevice);

#define d_buffer(index1, index2, index3) &d_buffer + index1/index2*index3
 //#define d_buffer(index1, index2, index3) d_buffer + index1/index2*index3

cl_mem* d_data=d_buffer(1,2,3);

clEnqueueReadBuffer(queue, *d_data,...)// Error reading d_data

I tried clEnqueueMapBuffer or CL_MEM_ALLOC_HOST_PTR for the clCreateBuffer, it doesn't work either.

1

1 Answers

1
votes

cl_mem is an opaque object. You should not perform pointer arithmetic on it; attempting to do so will result in very nasty bugs.

I'm not familiar with how CUDA handles buffer allocation, but the implication of your commented out code is that CUDA buffers are always Host-Visible. This is very strictly not the case in OpenCL. OpenCL allows you to "Map" a buffer to host-visible memory, but it won't be implicitly visible to the host. If you intend to read an arbitrary index of the buffer, you need to either map it first or copy it to host data.

float * h_data = new float[1000];
cl_mem d_buffer=clCreateBuffer(...);

clEnqueueWriteBuffer(queue, d_buffer, true, 0, 1000 * sizeof(float), h_data, 0, nullptr, nullptr);
//======OR======
//float * d_data = static_cast<float*>(clEnqueueMapBuffer(queue, d_buffer, true, CL_MAP_WRITE, 0, 1000 * sizeof(float), 0, nullptr, nullptr, nullptr));
//std::copy(h_data, h_data + 1000, d_data);
//clEnqueueUnmapMemObject(queue, d_buffer, d_data, 0, nullptr, nullptr);
//clEnqueueBarrier(queue);

//Do work with buffer, probably in OpenCL Kernel...

float result;
size_t index = 1 / 2 * 3; //This is what you wrote in the original post
clEnqueueReadBuffer(queue, d_buffer, true, index * sizeof(float), 1 * sizeof(float), &result, 0, nullptr, nullptr);
//======OR======
//float * result_ptr = static_cast<float*>(clEnqueueMapBuffer(queue, d_buffer, true, CL_MAP_READ, index * sizeof(float), 1 * sizeof(float), 0, nullptr, nullptr, nullptr));
//result = *result_ptr;
//clEnqueueUnmapMemObject(queue, d_buffer, result_ptr, 0, nullptr, nullptr);
//clEnqueueBarrier(queue);

std::cout << "Result was " << result << std::endl;