What is the clock measure by clock()
and clock64()
in CUDA ?
According to CUDA documentation the clock is 'per-multiprocessor counter'. According to my understanding this refers to Primary GPU clock (not the shader clock).
But when I measure clock counts and convert it to time values using primary GPU clock frequency, the results I get are twice large as the real values (I measure real values using the kernel execution time from host code using cuda events). This suggests clock()
returns the shader clock frequency instead of the primary GPU clock.
How can I solve this confusion ?
EDIT : I calculated the primary GPU clock frequency by dividing the clock rate I get from cudaGetDeviceProperties by 2. As far as I understand the value given by cudaGetDeviceProperties is the shader clock frequency.
cudaDeviceProp::clockRate
is the shader clock rate, that is, double value compared to the "primary" GPU clock. On Kepler devices, the two are the same. The answer would be more certain if you tell which device you are using. Not sure aboutclock()
andclock64()
- you are probably right in your assumption. – void_ptrnvidia-smi
reports these as "graphics" and "SM" clocks, respectively For example on my Fermi-based Quadro 2000, the former is reported as 625 MHz, the latter as 1251 MHz. Best I know, starting with Kepler all of the non-memory domain of a GPU runs at the same speed, i.e. there is no more SM hot clock. – njuffa