2
votes

I'm having some issues with the CUDA nvprof profiler. Some of the metrics on the site are named differently than in the profiler, and the variables don't seem to be explained anywhere on the site, or for that matter anywhere on the web (I wasn't able to find any valid reference).

I decoded most of those (here: calculating gst_throughput and gld_throughput with nvprof), but I'm still not sure about:

elapsed_cycles
max_warps_per_sm

Anyone knows precisely how to count those?

I'm trying to use the nvprof to assess some 6000 different kernels via cmdline, so it is not really viable for me to use the visual profiler.

Any help appreciated. Thanks very much!

EDIT: What I'm using:

CUDA 5.0, GTX480 which is cc. 2.0.

What I've already done:

I've made a script that gets the formulas for each of the metrics from the profiler documentation site, resolves dependencies for any given metric, extracts those through nvprof and then counts the results from those. This involved using a (rather large) sed script that changes all the occurrences of variables that appear on the site to the ones with the same meaning that are actually accepted by the profiler. Basically I've emulated grepping metrics via nvprof. I'm just having problems with those:

Why there is a problem with those concrete variables:

max_warps_per_sm - If it is the bound of the cc or another metric/event which I am perhaps somehow missing and is specific for my program (wouldn't be a surprise as some of the variables in the profiler documentation have 3 (!) different names all for the same thing).

elapsed_cycles - I don't have elapsed_cycles in the output of nvprof --query-events. Not even anything containing the words "elapse" and the only one containing "cycle" is "active_cycles". Could that be it ? Is there any other way to count it? Is there any harm done in using "gputime" instead of this variable ? I don't need absolute numbers, I'm using it to find correlations and analyze code so if "gputime"= "elapsed_cycles" * CONSTANT, I'm perfectly okay with that.

1
Which version of CUDA are you using? The profiling tools evolve, so we need that information in order to help you.BenC

1 Answers

2
votes

You can use the following command that lists all the events available on each device:

nvprof --query-events

This is not very complete, but it's a good start to understand what these events/metrics are. For instance, with CUDA 5.0 and a CC 3.0 GPU, we get:

elapsed_cycles_sm: Elapsed clocks

elapsed_cycles_sm is the number of elapsed clock cycles per multiprocessor. If you want to measure this metric for your program:

nvprof --events elapsed_cycles_sm ./your_program

max_warps_per_sm is quite straightforward: this is the maximum number of resident warps per multiprocessor. This value depends on the Compute Capability (see the chart here). This is a hardware limit, no matter what your kernels are, at any given time, you will never have more resident warps per multiprocessor than this value.

Also, more information is available in the profiler's online documentation, with descriptions and formulae.

UPDATE

According to this answer:

active_cycles: Number of cycles a multiprocessor has at least one active warp.