Where is it pushed on?
esp - 4
. More precisely:
esp
gets subtracted by 4
- the value is pushed to
esp
pop
reverses this.
The System V ABI tells Linux to make rsp
point to a sensible stack location when the program starts running: What is default register state when program launches (asm, linux)? which is what you should usually use.
How can you push a register?
Minimal GNU GAS example:
.data
/* .long takes 4 bytes each. */
val1:
/* Store bytes 0x 01 00 00 00 here. */
.long 1
val2:
/* 0x 02 00 00 00 */
.long 2
.text
/* Make esp point to the address of val2.
* Unusual, but totally possible. */
mov $val2, %esp
/* eax = 3 */
mov $3, %ea
push %eax
/*
Outcome:
- esp == val1
- val1 == 3
esp was changed to point to val1,
and then val1 was modified.
*/
pop %ebx
/*
Outcome:
- esp == &val2
- ebx == 3
Inverses push: ebx gets the value of val1 (first)
and then esp is increased back to point to val2.
*/
The above on GitHub with runnable assertions.
Why is this needed?
It is true that those instructions could be easily implemented via mov
, add
and sub
.
They reason they exist, is that those combinations of instructions are so frequent, that Intel decided to provide them for us.
The reason why those combinations are so frequent, is that they make it easy to save and restore the values of registers to memory temporarily so they don't get overwritten.
To understand the problem, try compiling some C code by hand.
A major difficulty, is to decide where each variable will be stored.
Ideally, all variables would fit into registers, which is the fastest memory to access (currently about 100x faster than RAM).
But of course, we can easily have more variables than registers, specially for the arguments of nested functions, so the only solution is to write to memory.
We could write to any memory address, but since the local variables and arguments of function calls and returns fit into a nice stack pattern, which prevents memory fragmentation, that is the best way to deal with it. Compare that with the insanity of writing a heap allocator.
Then we let compilers optimize the register allocation for us, since that is NP complete, and one of the hardest parts of writing a compiler. This problem is called register allocation, and it is isomorphic to graph coloring.
When the compiler's allocator is forced to store things in memory instead of just registers, that is known as a spill.
Does this boil down to a single processor instruction or is it more complex?
All we know for sure is that Intel documents a push
and a pop
instruction, so they are one instruction in that sense.
Internally, it could be expanded to multiple microcodes, one to modify esp
and one to do the memory IO, and take multiple cycles.
But it is also possible that a single push
is faster than an equivalent combination of other instructions, since it is more specific.
This is mostly un(der)documented:
b
,w
,l
, orq
to denote the size of the memory being manipulated. Ex:pushl %eax
andpopl %eax
– Hawken%eax
is always 32 bit in size. – Gunther Piez