Reading Shared/Local Memory Store/Load bank conflicts hardware counters for OpenCL executable under Nvidia

Question

It is possible to use nvprof to access/read bank conflicts counters for CUDA exec:

nvprof --events shared_st_bank_conflict,shared_ld_bank_conflict my_cuda_exe

However it does not work for the code that uses OpenCL rather then CUDA code.

Is there any way to extract these counters outside nvprof from OpenCL environment, maybe directly from ptx?
Alternatively is there any way to convert PTX assembly generated from nvidia OpenCL compiler using clGetProgramInfo with CL_PROGRAM_BINARIES to CUDA kernel and run it using cuModuleLoadDataEx and thus be able to use nvprof?
Is there any simulation CPU backend that allows to set such parameters as bank size etc?

Additional option:

Use converter of opencl to cuda code inlcuding features missing from CUDA like vloadn/vstoren, float16, and other various accessors. #define work only for simple kernels. Is there any tool that provides it?

Can you pass the OpenCL-generated PTX to cuModuleLoadDataEx? There's no guarantee that the same ptxas compilation from PTX to SASS is the same, but it's a reasonable guess. It's possible options to ptxas differ from OpenCL and CUDA (e.g. rounding rules). There's no guarantee that you'd be profiling the same programs, but perhaps it's a good approximation. — Tim

talonmies talonmies · Accepted Answer · 2020-10-25T00:35:17

Is there any way to extract these counters outside nvprof from OpenCL environment, maybe directly from ptx?

No. Nor is there in CUDA, nor in compute shaders in OpenGL, DirectX or Vulkan.

Alternatively is there any way to convert PTX assembly generated from nvidia OpenCL compiler using clGetProgramInfo with
CL_PROGRAM_BINARIES to CUDA kernel and run it using
cuModuleLoadDataEx and thus be able to use nvprof?

No. OpenCL PTX and CUDA PTX are not the same and can't be used interchangeably

Is there any simulation CPU backend that allows to set such parameters as bank size etc?

Not that I am aware of.

Reading Shared/Local Memory Store/Load bank conflicts hardware counters for OpenCL executable under Nvidia

1 Answers