3
votes

So I am running my OpenCL program on a GeForce GT 610. I know CUDA would be a better alternative, and I may write a CUDA version of my code later, however for know I am writing in OpenCL for the sake of also being able to run on AMD graphics cards.

During initialization I pick out a device to run on. Here is what my program prints out during this phase:

OpenCL Platform 0: NVIDIA CUDA
 ----- OpenCL Device # 0: GeForce GT 610-----
Gflops: 1.620000
Max Compute Units: 1
Max Clock Frequency: 1620
Total Memory of Device (bytes): 1072889856
Max Size of Memory Object Allocation (bytes): 268222464
Max Work Group Size: 1024

My question is why does it say the max compute unit is only 1? According to the spec details on the GeForce site, it has 48 CUDA cores. I know that CUDA runs better on Nvidia cards, but does it really limit it this much? Nvidia limits OpenCL to 1/48th of the power?

Here is what my code the prints the following looks like:

if (clGetPlatformInfo(platforms[platform], CL_PLATFORM_NAME, sizeof(name), name, NULL)) Fatal("Cannot get OpenCL platform name\n");
if (verbose) printf("OpenCL Platform %d: %s\n", platform, name);

... inside forloop ...

  cl_uint compUnits, freq;
  cl_ulong memSize, maxAlloc;
  size_t maxWorkGrps;

  if (clGetDeviceInfo(id[devId], CL_DEVICE_MAX_COMPUTE_UNITS, sizeof(compUnits), &compUnits, NULL)) Fatal("Cannot get OpenCL device units\n");
  if (clGetDeviceInfo(id[devId], CL_DEVICE_MAX_CLOCK_FREQUENCY, sizeof(freq), &freq, NULL)) Fatal("Cannot get OpenCL device frequency\n");
  if (clGetDeviceInfo(id[devId], CL_DEVICE_NAME, sizeof(name), name, NULL)) Fatal("Cannot get OpenCL device name\n");

  if (clGetDeviceInfo(id[devId], CL_DEVICE_GLOBAL_MEM_SIZE, sizeof(memSize), &memSize, NULL)) Fatal("Cannot get OpenCL memory size.\n");
  if (clGetDeviceInfo(id[devId], CL_DEVICE_MAX_MEM_ALLOC_SIZE, sizeof(memSize), &maxAlloc, NULL)) Fatal("Cannot get OpenCL memory size.\n");

  if (clGetDeviceInfo(id[devId], CL_DEVICE_MAX_WORK_GROUP_SIZE, sizeof(maxWorkGrps), &maxWorkGrps, NULL)) Fatal("Cannot get OpenCL max work group size\n");

  int Gflops = compUnits * freq;

  if (verbose) printf(" ----- OpenCL Device # %d: %s-----\n"
    "Gflops: %f\n"
    "Max Compute Units: %d\n"
    "Max Clock Frequency: %d\n"
    "Total Memory of Device (bytes): %lu\n"
    "Max Size of Memory Object Allocation (bytes): %lu\n"
    "Max Work Group Size: %d\n",
    devId,
    name,
    1e-3*Gflops,
    compUnits,
    freq,
    memSize,
    maxAlloc,
    maxWorkGrps);
1

1 Answers

4
votes

My question is why does it say the max compute unit is only 1?

The compute unit being referred to here corresponds to a NVIDIA GPU SM (streaming multiprocessor). That GPU has exactly one SM, which has 48 cores inside it.

So you're not limited to a single core or 1/48th of the capability of that GPU. Access to that compute unit means your program will have access to the 48 cores contained in it.