I wrote some Java code that uses JCuda to execute some CUDA kernels. I would like to profile the application in order to understand how streams are overlapped and whatnot. I am able to use cuda event calls such as cudaEventElpasedTime to get the execution time of a kernel, but I do not know how to get the starting and ending timestamps for the same kernel.
I know nvprof can generate such results and display the timelines, but I do not find a way to run nvprof with a Java application.
Edit: Now I understand how to use nvprof to profile a Java application thanks to the answers. I still prefer getting the starting and ending times using cudaEvent calls so I would have more control. It seems nvprof can get that information but there is no APIs for an end user to do so?
sh
file, but am not sure (I haven't used the visual profiler actively for a while, because it didn't work with JCuda, and never used it on Linux at all, but conceptually, I think that it should work...) – Marco13