x86_64 - Why is timing a program with rdtsc/rdtscp giving unreasonably large numbers?

Question

I'm trying to time a subroutine using rdtscp. This is my procedure:

; Setting up time
rdtscp                      ; Getting time
push rax                    ; Saving timestamp

; for(r9=0; r9<LOOP_SIZE; r9++)
mov r9, 0
lup0:
call subr
inc r9
cmp r9, LOOP_SIZE
jnz lup0

; Calculating time taken
pop rbx                     ; Loading old time
rdtscp                      ; Getting time
sub rax, rbx                ; Calculating difference

if LOOP_SIZE is small enough, I get consistent and expected results. However, when I make it big enough (around 10^9) I spike from 10^9 to 10^20.

; Result with "LOOP_SIZE equ 100000000"
971597237
; Result with "LOOP_SIZE equ 1000000000"
18446744072281657066

The method that I'm using to display the numbers displays them as unsigned, so I imagine that the large number displayed is actually a negative number and an overflow happened. However, 971597237 is not even close to the 64 bit integer limit, so, assuming that the problem is an overflow, why is it happening?

rdtsc annoyingly puts its result in EDX:EAX even in 64-bit mode. felixcloutier.com/x86/rdtsc. You're only saving / using the low 32 bits of the TSC, and getting a 32-bit unsigned difference, sign-extended to 64-bit because you're computing it with sub rax, rbx on the zero-extended 32-bit values instead of sub eax, ebx. — Peter Cordes

Luiz Martins Luiz Martins · Accepted Answer · 2020-11-19T03:18:03

The problem is that as per documentation, the value of rdtscp is not stored on rax, but on edx:eax (which means that the high bits are on edx and the low bits on eax) even on 64 bit mode.

So, if you want to use the full 64-bit value on rax, you have to move the higher bits from edx:

; Setting up time
rdtscp                      ; Getting time
shl rdx, 32                 ; Shifting rdx to the correct bit position
add rax, rdx                ; Adding both to make timestamp
push rax                    ; Saving timestamp

; [...stuff...]

; Calculating time taken
rdtscp                      ; Getting time
pop rbx                     ; Loading old time (below rdtscp)
shl rdx, 32                 ; Shifting rdx to the correct bit position
add rax, rdx                ; Adding both to make timestamp
sub rax, rbx                ; Calculating difference

Edit: Moved pop rbx one line down, below rdtscp. As pointed out by Peter, some registers (rax, rdx and rcx) may be clobbed by rdtscp. In your example that's not a problem, but if you decided to pop rcx there instead, then it'd probably get overwritten by rdtscp, so it's good practice to only pop the stack after it.

Also, you can avoid two calls to the stack by saving the old timestamp in a register that your subroutine doesn't use:

; Setting up time
rdtscp                      ; Getting time
shl rdx, 32                 ; Shifting rdx to the correct bit position
lea r12, [rdx + rax]        ; Adding both to make timestamp, and saving it

; [...stuff (that doesn't use r12)...]

; Calculating time taken
rdtscp                      ; Getting time
shl rdx, 32                 ; Shifting rdx to the correct bit position
add rax, rdx                ; Adding both to make timestamp
sub rax, r12                ; Calculating difference

x86_64 - Why is timing a program with rdtsc/rdtscp giving unreasonably large numbers?

1 Answers