0
votes

A thread traps to the kernel with INT 80; the interrupt gate is used to change the privilege ring and CS:RIP; the old values are pushed to the 'stack'.

I found this:

'When a thread enters the kernel, the current value of the user-mode stack (SS:ESP) and instruction pointer (CS:EIP) are saved to the thread's kernel-mode stack, and the CPU switches to the kernel-mode stack - with the int $80 syscall mechanism, this is done by the CPU itself. The remaining register values and flags are then also saved to the kernel stack.'

How does the CPU know the address of the thread's kernel mode stack in order to do this? The only place I can think of where the thread's kernel mode stack pointer is stored is in the TCB, but how does the CPU know where to locate the TCB for the current thread. Does it refer to a single TCB in a fixed and known location?

1

1 Answers

2
votes

Note: This is all "80x86 specific" (different CPUs, like ARM, are different), and for 80x86 there are (minor in practice) differences between protected mode (used for 32-bit kernels) and long mode (used for 64-bit kernels).

The CPU has a Task Register that keeps track of the (virtual) address of a structure called the Task State Segment. For long mode; this structure contains the value to load into RSP when changing to a higher privilege level and when using the CPU's Interrupt Stack Table feature.

When any interrupt occurs the CPU obtains information from the corresponding entry in the Interrupt Descriptor Table (including determining which privilege level to switch to and if the interrupt is using the Interrupt Stack Table feature); then (if the stack is being changed) uses the appropriate field in the Task State Segment (found via. the Task Register) to determine what to load into RSP.

During boot the kernel creates these arrays and structures for the CPU (IDT, TSS) and sets the Task Register; and during task switches the kernel modifies the RSP0 field in the TSS (that determines what to load into RSP when the CPU switches from a lower privilege level to CPL=0) so that it's a different value for each task (so that each task has a different kernel stack). Internally the kernel would have some other data structure ("task control block") that keeps track of the value to copy into the RSP0 field during task switches.