5
votes

What is the clock measure by clock() and clock64() in CUDA ?

According to CUDA documentation the clock is 'per-multiprocessor counter'. According to my understanding this refers to Primary GPU clock (not the shader clock).

But when I measure clock counts and convert it to time values using primary GPU clock frequency, the results I get are twice large as the real values (I measure real values using the kernel execution time from host code using cuda events). This suggests clock() returns the shader clock frequency instead of the primary GPU clock.

How can I solve this confusion ?

EDIT : I calculated the primary GPU clock frequency by dividing the clock rate I get from cudaGetDeviceProperties by 2. As far as I understand the value given by cudaGetDeviceProperties is the shader clock frequency.

1
primary GPU clock / Graphics Core Clock / Graphic Clock / Core Clock : Clock rate the the Streaming Multiprocessor runs. <br/> shader clock/ Shader Core Clock / Processor Clock / GPU clock : Clock rate that execution units (CUDA cores) run. This is twice the value of primary GPU clock. This is how I have understood it.Optimus
I can confirm that on Fermi devices, cudaDeviceProp::clockRate is the shader clock rate, that is, double value compared to the "primary" GPU clock. On Kepler devices, the two are the same. The answer would be more certain if you tell which device you are using. Not sure about clock() and clock64() - you are probably right in your assumption.void_ptr
I think @Optimus is referring to the following: On older GPUs (e.g. Fermi family), the execution units run at twice the clock rate of the rest of the graphics domain (this is sometimes refefred to as the "hot clock"). nvidia-smi reports these as "graphics" and "SM" clocks, respectively For example on my Fermi-based Quadro 2000, the former is reported as 625 MHz, the latter as 1251 MHz. Best I know, starting with Kepler all of the non-memory domain of a GPU runs at the same speed, i.e. there is no more SM hot clock.njuffa
My device is Quadro 2000D. The clock frequency given from 'cudaDeviceProp::clockRate' is 1251 MHz which is the shader clock frequency. The reason for my confusion is in the CUDA documentation they say 'per-multiprocessor counter' which refer to the primary GPU clock.Optimus
@njuffa : How did you get 625 MHz ? Is it from a datasheet or from a CUDA function ?Optimus

1 Answers

5
votes

It's true that CUDA documentation says clock() and clock64() returns 'per-multiprocessor counter'. But in Fermi architecture what clock() and clock64() actually returns is the shader clock counter.

The clockRate returned by cudaGetDeviceProperties is the shader clock frequency.

So to compute the time, we have to divide the clock count from clock() or clock64() by shader clock frequency you get from cudaGetDeviceProperties.