I am new to OpenCL and learn it from Matthew Scarpino's “OpenCL in Action”. I examined an example of matrix multiplication by a vector (p. 11-13). For some reason, the example did not work on my computer. The kernel did not return the values. I began to look for simple examples of outputting data from the kernel.
I found in Wesley Shillingford’s youtube channel an example of outputting the string of characters “Hello world!” from the kernel. On my home computer, the example worked. However, the OpenCL "kitchen" has remained closed since the example presented was written in C++. The brevity of the code obscured the concept of what is happening. So I began to look further for examples in C code.
Among the answers on the Stackoverflow , I found an example of a minimal OpenCL program that increments the value in the kernel. I took this code as the basis for writing my program, because it is simple and convenient for a beginner. As I later found out, the example contained an error.
Another great example convinced me to use pointers to return data from the kernel. Using arrays to store the output values of the kernel leads to the fact that the values of the target array do not change, and the data from the kernel disappears during output. I realized that we need to use pointers to output data from the kernel. However, this did not help me. The problem of transferring data from the kernel to the host program remains. Please correct me if I am mistaken in something. The essence of the topic: The kernel does not return character data to the host program. What could be the problem?
#include <CL/cl.h>
#include <stdio.h>
#include <stdlib.h>
int main(){
cl_platform_id *platforms =NULL;
cl_device_id *devices=NULL;
cl_context context;
cl_command_queue cmdQueue;
cl_program program;
cl_kernel kernel = NULL;
char *cpOutputData;
int output_size = 8;
cl_mem output_buff;
cl_int status; // to check the output of each API call
const char *source =
"__kernel void Hello( __global char* ch) {\n"
" ch[0]='P';"
" ch[1]='r';"
" ch[2]='i';"
" ch[3]='v';"
" ch[4]='e';"
" ch[5]='t';"
" ch[6]='!';"
" ch[7]='\0';"
"}\0";
printf("GetPlatformIDs... ");
cl_uint numPlatforms = 0;
//STEP 1: Discover and initialize platforms
// Use clGetPlatformIDs to retreive the number of platforms
status = clGetPlatformIDs(0,
NULL,
&numPlatforms);
// Allocate enough space for each platform
platforms = (cl_platform_id*)malloc(numPlatforms*sizeof(cl_platform_id));
// Fill in platforms with clGetPlatformIDs()
status=clGetPlatformIDs(numPlatforms,
platforms,
NULL);
printf("\nNumber of discovered platforms is %d. ", numPlatforms);
// STEP 2: Discover and initialize devices
printf("OK.\nGetDeviceIDs... ");
cl_uint numDevices = 0;
// Use clGetDeviceIDs() to retrieve the number of devices present
status = clGetDeviceIDs(platforms[0],
CL_DEVICE_TYPE_CPU,
0,
NULL,
&numDevices);
// Allocate enough space for each device
devices = (cl_device_id*)malloc(numDevices*sizeof(cl_device_id));
// Fill in devices with clGetDeviceIDs()
clGetDeviceIDs(platforms[0],
CL_DEVICE_TYPE_CPU,
numDevices,
devices,
NULL);
printf("\nNumber of discovered devices is %d. ", numDevices);
// STEP 3: Create a context
printf("OK.\nCreating context... ");
// Create context using clCreateContext() and associate it with the devices
context = clCreateContext(NULL,
numDevices,
devices,
NULL,
NULL,
&status);
// STEP 4: Create a command queue
printf("OK.\nQueue creating... ");
cmdQueue = clCreateCommandQueue(context,
devices[0],
CL_QUEUE_PROFILING_ENABLE,
&status);
// STEP 5: Create device buffers
printf("OK.\nOutput buffer creating... ");
output_buff = clCreateBuffer(context,
CL_MEM_WRITE_ONLY,
sizeof(char)*output_size,
NULL,
&status);
// STEP 6: Create and compile program
printf("OK.\nBuilding program... ");
// Create a program using clCreateProgramWithSource()
program = clCreateProgramWithSource(context,
1,
(const char**)&source,
NULL,
&status);
// Build (compile) the program for the devices with clBuildProgram()
status=clBuildProgram(program,
numDevices,
devices,
NULL,
NULL,
NULL);
// STEP 7: Create a kernel
printf("OK.\nCreating kernel... ");
kernel = clCreateKernel(program,
"Hello",
&status);
// STEP 8: Set kernel arguments
// Associate ouput buffer with the kernel
printf("OK.\nSetting kernel arguments... ");
status = clSetKernelArg(kernel,
0,
sizeof(cl_mem),
&output_buff);
// STEP 9: Configure the work-item structure
// Define an index space (global work size) of work itmes for execution.
// A workgroup size (local work size) is not required, but can be used.
size_t globalWorkSize[1];
// There are 'elements' work-items
globalWorkSize[0] = output_size;
// STEP 10: Enqueue the kernel for execution
printf("OK.\nExecuting kernel... ");
//Execute the kernel by using clEnqueueNDRangeKernel().
// 'globalWorkSize' is the 1D dimension of the work-items
clEnqueueNDRangeKernel(cmdQueue,
kernel,
1,
NULL,
globalWorkSize,
NULL,
0,
NULL,
NULL);
clFinish(cmdQueue);
// STEP 11: Read the ouput buffer back to the host
printf("OK.\nReading buffer... ");
// Allocate space for the data to be read
cpOutputData = (char*)malloc(output_size*sizeof(char));
// Use clEnqueueReadBuffer() to read the OpenCL ouput buffer to the host ouput array
clEnqueueReadBuffer(cmdQueue,
output_buff,
CL_TRUE,
0,
output_size,
cpOutputData,
0,
NULL,
NULL);
printf("\nPrinting output data: \n");
printf(cpOutputData);
// STEP 12: Releasing resources
printf("\n...Releasing OpenCL resources... ");
clReleaseKernel(kernel);
clReleaseProgram(program);
clReleaseCommandQueue(cmdQueue);
clReleaseMemObject(output_buff);
clReleaseContext(context);
printf("OK.\n...Releasing host resources... ");
free(cpOutputData);
free(platforms);
free(devices);
printf("OK.\nEnd of program. Bey!\n");
system("PAUSE");
return 0;
}
My program's execution output is here.