What is the difference between the ret instruction in x86 and x64?

Question

I was recently trying out a stack overflow exercise on x64. When performing this on x86, I would expect the following for a junk overwrite address (e.g. 'AAAA'):

The data I provide overflows the buffer, and overwrites the return address
Upon ret, the (overwritten) return address will be (effectively) popped into the EIP register
It is realised that the address is not valid, and a segmentation fault is raised

In x64, this seems different (beyond the interchange of EIP with RIP in the above steps). When providing a junk address of 'AAAAAAA', the processor seems to do some validity checking before popping the address. By observation, it seems required that the two most significant bytes of the address are null, before it is loaded. Otherwise, a segfault occurs. I believe this is due to the use of 48-bit addressing in x64, however I was under the impression that addresses starting with 0xFFFF were also valid, yet this also produces a segfault.

Is this an accurate description of the difference? Why is this check performed before the data is loaded into the RIP register, whilst the other validity check is performed afterwards? Are there any other differences between these instructions?

EDIT: To clarify my observations, I note that when a 8-byte return address is provided, the RIP still points to the address of the ret instruction, and the RSP still points to the overwritten return address on segfault. When an 6-byte return address is provided, the overwritten address has been popped into the RIP when the segfault is observed.

The check is done early because the rip register might not even have the bits in hardware so it can't be loaded. Yes, the top part of the address range should also be canonical, but you must use sign extension. So for example 0xffff800000000000 is canonical and will be loaded into rip and only fault afterwards :) 0xffff4141414141414141 is not canonical. — Jester
Addresses starting with 0xffff are only valid in kernel mode afaik. — fuz
@fuz: They're not automatically invalid (non-canonical), they're only protected by the normal page-table mechanism. (e.g. kernels set the U/S bit in PTEs so that most / all of those virtual pages are supervisor-only). Linux for example maps the ffffffffff600000-ffffffffff601000 range into user-space processes as the [vsyscall] page. (cat /proc/self/maps). — Peter Cordes
@Vortix: What exactly are you claiming happens? That RSP isn't updated or something? How are you distinguishing between code-fetch from an invalid page leading to a segfault vs. attempting to load a non-canonical address into RIP? The Operation section of the manual for ret is complicated by clutter from retf (far), but IA-32E-MODE-RETURN-TO-SAME-PRIVILEGE-LEVEL: includes IF the return instruction pointer is not within canonical address space THEN #GP(0); FI;. Instead of #PF, but GP or invalid pagefault both get the kernel to deliver SIGSEGV — Peter Cordes
You should edit your question to say that more clearly. Interesting that RSP doesn't get updated before the fault. So it's not code-fetch from a non-canonical address that faults, it's the ret instruction's attempt to set RIP to a non-canonical address. That makes the whole RET instruction fault, meaning that none of its effects are visible. — Peter Cordes

Peter Cordes Peter Cordes · Accepted Answer · 2020-06-13T22:30:51

Interesting that RSP doesn't get updated before the fault. So it's not code-fetch from a non-canonical address that faults, it's the ret instruction's attempt to set RIP to a non-canonical address.

That makes the whole RET instruction fault, meaning that none of its effects are visible. (Because Intel's manual doesn't define any partial-progress / stuff updated even on fault behaviour for ret.)

Unfortunately the Operation section for ret in Intel's manual is a rats nest of conditionals because they use one block to document near and far, and every combination of mode and operand-size. Plain ret in 64-bit mode is "IA-32e mode", operand-size=64, and "near" (not changing CS to a different code segment, just changing RIP).

Note that the upper-half of the canonical range starts at 0xffff800000000000: 48 bits sign extended to 64. 0xffff7f0000000000 is not canonical. The upper 16 have to match bit 48.

What is the difference between the ret instruction in x86 and x64?

1 Answers