I'm working with a Tesla P100 which has compute capability 6.0. I'd like to find a tool that automatically gives me the best grid and block sizes wrt my kernel code.
I recently discovered the CUDA Occupancy Calculator (the .xls spreadsheet). But when I downloaded that I realized it's a bit outdated, since the Capability was until 2.1.
I tried to search for a newer version of that spreadsheet, that allows calculation for higher C.C., but nothing showed up.
So I tried to search for an alternative and I found that from CUDA 6.5 on, were introduced Occupancy APIs. Is this the newer alternative to the spreadsheet?
Furthermore I found that tool from GitHub. Is it a good tool? Can I consider this as an alternative? Or is it better to use forementioned Occupancy APIs?
I was also asking to myself: can CUDA profilers (nvprof or Nsight) do some estimations on occupancy and give some optimal block/grid size?
I'm a bit unskilled about those tools, so I'm sorry if I ask trivial questions.