It is possible to use nvprof to access/read bank conflicts counters for CUDA exec:
nvprof --events shared_st_bank_conflict,shared_ld_bank_conflict my_cuda_exe
However it does not work for the code that uses OpenCL rather then CUDA code.
- Is there any way to extract these counters outside
nvprof
from OpenCL environment, maybe directly from ptx? - Alternatively is there any way to convert PTX assembly generated from nvidia OpenCL compiler using
clGetProgramInfo
withCL_PROGRAM_BINARIES
to CUDA kernel and run it usingcuModuleLoadDataEx
and thus be able to usenvprof
? - Is there any simulation CPU backend that allows to set such parameters as bank size etc?
Additional option:
- Use converter of opencl to cuda code inlcuding features missing from CUDA like vloadn/vstoren, float16, and other various accessors.
#define
work only for simple kernels. Is there any tool that provides it?
cuModuleLoadDataEx
? There's no guarantee that the sameptxas
compilation from PTX to SASS is the same, but it's a reasonable guess. It's possible options toptxas
differ from OpenCL and CUDA (e.g. rounding rules). There's no guarantee that you'd be profiling the same programs, but perhaps it's a good approximation. – Tim