Call Instruction: Compilation into machine code

Question

How is Assembly call Instruction compiled into machine code ? What does happen to Labels ? how does the machine code call instruction refer to a specific function when the label no longer exists ?

I know that the labels, in the compiled code, are replaced by the addresses of the function.

However, the instructions of the function are loaded into the ram memory only after the program runs. So how does the machine code inside indicate a specific function that was indicated, before compilation, by means of a label?

Please answer me in a simple and understandable way, possibly with a practical example.

"Labels are replaced by the addresses of the function" is really all there is to it. The assembler assigns sequential addresses to the machine code instructions as it emits them, so it knows exactly where any given function will be located in memory at runtime. The virtual memory hardware on modern computers allows all programs to reside at the same starting memory address, even if multiple programs are running, because each one has its own private memory space. — jasonharper

Erik Eidt Erik Eidt · Accepted Answer · 2019-12-22T19:42:43

Assembly language labels are a compile-time assembly-time and link-time construct — during assembly and/or linking, these labels are given a memory addresses, sometimes as an absolute address, but often as a relative address from the beginning of the section the label occurs in.

The labels (and the mapping from symbol/label name to address or offset) are omitted in the machine code — today's processors doesn't know about or see assembly labels in machine code.

In assembly language, the call instructions, as well as branch instructions for if-then, while, etc.. have a branch target as one of their operands. For most of these instructions, in machine code, the operand is encoded as a pc-relative offset stored in an immediate field as an operand of the machine code instruction.

See pc-relative addressing mode.

A pc-relative offset in an immediate value is restored to an absolute address by the hardware, using a formula something like address of instruction + immediate:

pc_next-cycle := pc_{current-branch-instruction} + immediate * Scale + Bias.

 CPU   Scale    Bias
 -------------------
 x86      1       0
 MIPS     4       4
 RISC V   2       0

The assembler/linker then uses the same formula, though solved for the immediate, effectively reversing this computation:

offset = (label_target - pc_{current-branch-instruction} - Bias) / Scale

This offset is then encoded in the immediate field of the branch or call instruction.

On a MIPS processor, for example, a branch instruction has a 16-bit immediate field, for holding such offset. An immediate value of -1 will branch to self, and immediate value of -5 will branch backwards 4 instructions, and an immediate value of +5 will branch forwards 6 instructions.

MIPS call instructions use an absolute address rather than pc-relative, so they are encoded as a large immediate with a text-segment relative offset of 26 bits in length. (The computation does not sum the lower 26 bits of the pc, so is not really considered pc-relative addressing.)

On RISC V and x86, both branch and call instructions can be done using pc-relative addressing, though on x86 absolute and other addressing modes are available as well.

Call Instruction: Compilation into machine code

1 Answers