What's the newer version or the alternative to Cuda Occupancy Calculator?

Question

I'm working with a Tesla P100 which has compute capability 6.0. I'd like to find a tool that automatically gives me the best grid and block sizes wrt my kernel code.

I recently discovered the CUDA Occupancy Calculator (the .xls spreadsheet). But when I downloaded that I realized it's a bit outdated, since the Capability was until 2.1.

I tried to search for a newer version of that spreadsheet, that allows calculation for higher C.C., but nothing showed up.
So I tried to search for an alternative and I found that from CUDA 6.5 on, were introduced Occupancy APIs. Is this the newer alternative to the spreadsheet?
Furthermore I found that tool from GitHub. Is it a good tool? Can I consider this as an alternative? Or is it better to use forementioned Occupancy APIs?

I was also asking to myself: can CUDA profilers (nvprof or Nsight) do some estimations on occupancy and give some optimal block/grid size?

I'm a bit unskilled about those tools, so I'm sorry if I ask trivial questions.

An updated version of the occupancy calculator ships in every CUDA toolkit. If you have the toolkit installed, you have the spreadsheet. Look under tools in the CUDA install directory — talonmies

Unknown Unknown · Accepted Answer · 2019-07-02T13:55:43

An updated version of the CUDA occupancy calculator spreadsheet ships with the CUDA toolkit, so when you install the CUDA toolkit, the excel spreadsheet is also installed on your machine. Maybe easiest just to use a file find utility for your OS to find it.

The CUDA occupancy API allows you to make the same calculations at runtime.

NVIDIA profilers offer some capability to inspect achieved occupancy. For example, nvvp can display achieved occupancy, and there is a metric for achieved occupancy which you can gather with nvprof. You may wish to simply search the profiler docs for the word "occupancy". These tools don't make estimations of optimal block and grid sizes, but they may give an indication as to whether occupancy may be a performance limiter for your application. These tools can also report the actual block and grid sizes for each kernel launch.

What's the newer version or the alternative to Cuda Occupancy Calculator?

2 Answers