1
votes

I'm programming in a Keil Board and am trying to count the number of clock periods taken for execution by a code block inside a C function.

Is there a way to get time with precision to microseconds before and after the code block, so that I can get the diff and divide it by the number of clock periods per microsecond to compute the clock periods consumed by the block?

The clock() function in time.h gives time in seconds which will give the diff as 0 as it is a small code block that I'm trying to get the clock periods for.

If this is not a good way to solve this problem are there alternatives?

3
Have you tried clock_gettime() with the various clocks options? CLOCK_MONOTONIC_RAW or CLOCK_PROCESS_CPUTIME_ID. But for them to be precise you need to enable High-Resolution timers in your kernel (on regular distributions it is already enabled by default, but these are not embedded exactly)EdwardH
clock_gettime() is available in linux only right. It's not available in the libraries (time.h) that come with Keil MicroVision IDE...rutxkl
@EdwardH: rutxkl has failed to specify exactly what board/chip he is using, but most Keil boards are low-end ARM micro-controllers and unlikely to be running Linux, or indeed any OS other than a simple real-time kernel. I think you may have assumed too much!Clifford
@Clifford: Indeed I have, the embedded-linux tag on the post confused me.EdwardH
@Edward: I did not notice that! Your assumption was perhaps valid, but rutxkl response suggests that it is mis-tagged. Removed it.Clifford

3 Answers

5
votes

Read up on the timers in the chip, find one the operating system/environment you are using has not consumed and use it directly. This takes some practice, you need to use volatiles to not let the compiler re-arrange your code or not re-read the timer. And you need to adjust the prescaler on the timer so that it gets the most practical resolution without rolling over. So start with a big prescale divisor, convince yourself it is not rolling over, then make that prescale divisor shorter, until you reach a divide by one or or reach the desired accuracy. If divide by one doesnt give you enough then you have to call the function many times in a loop and time around that loop. Remember that any time you change your code to add these measurements you can and will change the performance of your code, sometimes small enough not to notice, sometimes 10% - 20% or more. if you are using a cache then any line of code you add or remove can change the performance by double digit percentages and you have to understand more about timing your code at that point.

3
votes

The best way to count the number of clock cycles in the embedded world is to use an oscilloscope. Toggle a GPIO pin before and after your code block and measure the time with the oscilloscope.The measured time multiplied by the CPU frequency is the numbler of CPU clock cycles spent.

1
votes

You have omitted to say what processor is on the board (far more important than the brand of board!), if the processor includes ETM, and you have a ULINK-Pro or other trace-capable debugger then uVision can unintrusively profile the executing code directly at the instruction cycle level.

Similarly if you run the code in the uVision simulator rather than real hardware, you can get cycle accurate profiling and timing, without the need for hardware trace support.

Even without the trace capability, uVision's "stopwatch" feature can perform timing between two break-points directly. The stopwatch is at the bottom of the IDE in the status bar. You do need to set the clock frequency in the debugger trace configuration to get "real-time" from the stop-watch.

A simple approach that requires no special debug or simulator capability is to use an available timer peripheral (or in the case of Cortex-M devices the sysclk) to timestamp the start and end of execution of a code section, or if you have no available timing resource, you could toggle a GPIO pin and monitor it on an oscilloscope. These methods have some level of software overhead that is not present in hardware or simulator trace, that may make them unsuitable for very short code sections.