I'm writing an openCL kernel to perform a brute force based AI for a puzzle game, but I have a problem with my Kernel code and/or the auxiliary function it calls. Here is my kernel code (I'm confident the inputs are getting passed here correctly): 60 is the global work size set by clEnqueueNDRangeKernel.
The inputs to the kernel are as follows:
__global char * in //dummy input for testing purposes
__global char * board_in, // a large char array containing 60 boards
__global int * lookup, // an array that I use to quickly get the score of scoring moves
Outputs:
__global char * out, //dummy output for testing
__global int * score_out, //an array of 60 scores: one for each board
__global int * row_out, // an array of 60 rows: one for each board evaluated
__global int * col_out // an array of 60 cols: ...
__kernel void helloworld(__global char* in,
__global char* board_in,
__global int* lookup,
__global char* out,
__global int * score_out,
__global int * row_out,
__global int * col_out)
{
int num = get_global_id(0);
char workingBoard[72];
int scoreMat[64];
//set up the array for each thread to use
for(int k=0; k< 72; k++)
{
workingBoard[k] = board_in[num*BOARDSIZE+k];
}
// Make a copy of the score matrix for each thread to use
for(int j=0; j<64; j++)
{
scoreMat[j] = lookup[j];
}
int s=0;
int r=0;
int c=0;
findBestMove(workingBoard,scoreMat,&s,1,&r,&c);
col_out[num] = ?????????
score_out[num] = ???????????
row_out[num] = ???????????????
}
The function findBestMove works like this (Its pretty well tested. I've used it in a CPU implementation for a while): It takes a Board (char array), a score-lookup array, a pointer to what the move scores, the current depth, and a pointer to the row and column. It is supposed to set the score, row, and column. It calls other functions that I define in the same document.
If I run this code snippet on the CPU, I get the proper output:
// workerBoard and lookuparr are set previous to this to be the same as what
//the kernel thread is supposed to have
int s=0;
int r=0;
int c=0;
findBestMove(workerBoard,lookuparr,&s,1,&r,&c);
cout<<s<<","<<r<<","<<c<<endl;
When I run my kernel code, I don't make it past the function call. The function is defined in the same document as the kernel, and doesn't use dynamic memory, function pointers, recursion, or global memory (outside of the kernel args). I do use some #define statements.
I want to set the ???? sections of my kernel to be r, c and s, but as mentioned, I don't get there. Am I making any critical mistakes (note: the kernel passes my code-checker and AMD's kernel Analyzer). Also, I'm pretty new to openCL, so any tips are welcome as well. If I can provide any more information to help answer this question, let me know!