Getting detailed information about compiled OpenCL kernel on NVidia

Question

Is there a way to get detailled information about how an OpenCL kernel was compiled on NVidia platforms (or on other platforms). Either external tools or tests that can be put into the kernel. Specifically:

Did vectorization succeed, and how are did the work items get grouped into warps?
If work items inside a work group go into different branches, did the compiler optimize it so that they still execute in parallel?
Did private memory variables get mapped to registers in the multiprocessor, or were they put into local/global memory? (Some architectures have more private memory per work group than local memory)

Can this information be seen in the PTX assembly output, or is this still higher level?

mogu mogu · Accepted Answer · 2018-01-04T15:13:54

This is all compiler-level metadata; some of those are available through generic OpenCL API but the ones you request are way too low-level. Might be available through some Nvidia OpenCL extension though, i'm not familiar with those. Probably your best bet is finding some tools working on PTX level and feeding it the OpenCL program binaries.

Getting detailed information about compiled OpenCL kernel on NVidia

2 Answers