7
votes

I'm trying to understand which events can cause a transition from userspace to the linux kernel. If it's relevant, the scope of this question can be limited to the x86/x86_64 architecture.

Here are some sources of transitions that I'm aware of:

  • System calls (which includes accessing devices) causes a context switch from userspace to kernel space.
  • Interrupts will cause a context switch. As far as I know, this also includes scheduler preemptions, since a scheduler usually relies on a timer interrupt to do its work.
  • Signals. It seems like at least some signals are implemented using interrupts but I don't know if some are implemented differently so I'm listing them separately.

I'm asking two things here:

  1. Am I missing any userspace->kernel path?
  2. What are the various code paths that are involved in these context switches?
2
I'm pretty sure the only way you can change from user mode to kernel mode on x86 is a SYSENTER or an interrupt, both of which pass through arch/x86/entry/entry_{32,64}.S. System calls can be done as a SYSENTER or INT 80h. Some signal's are caused by interrupts from the processor (e.g. SIGSEGV), but entry from user space to kernel space is done using an interrupt. - Mikel Rychliski
How high would you say your confidence level is? :) I will wait a while and see if someone comes up with something else that could happen, but if not, why not put your comment as an answer? I might just accept it. :) - nitzanms
I think you've conflated two separate concepts - whether or not control passes to kernel code and kernel resources, and whether not a context switch occurs. Most modern operating systems, Linux included, map the kernel memory into every user process, but with restricted permissions. This is done specifically so that interrupts can run without causing a context switch, and instead just a processor state change that allows instructions to access kernel memory. IIRC, the only time a full context switch occurs is when a kernel thread is scheduled and run, eg to perform some deferred processing. - antiduh
To futher illustrate what I mean about one of the other mechanisms being used, consider a few example signals. In the case of a null pointer dereference causing a SIGSEGV, the kernel transition here is actually caused by a page fault, which is a type of exception. In the case of a process raising a signal itself, the kernel transition is caused by the kill() system call entry. In the case of a signal being sent from a process running on another CPU while the target is running in userspace, the kernel transition is caused by an Inter-Processor Interrupt. - caf
My point is just that signals themselves do not effect a switch to kernel mode, they always use one of the underlying mechanisms (system call, asychronous interrupt, exception). - caf

2 Answers

4
votes

One you are missing: Exceptions

(which can be further broken down in faults, traps and aborts)

For example a page fault, breakpoint, division by zero or floating-point exception. Technically, one can view exceptions as interrupts but not really the way you have defined an interrupt in your question.

You can find a list of x86 exceptions at this osdev webpage.

With regard to your second question:

What are the various code paths that are involved in these context switches?

That really depends on the architecture and OS, you will need to be more specific. For x86, when an interrupt occurs you go to the IDT entry and for SYSENTER you get to to address specified in the MSR. What happens after that is completely up to the OS.

1
votes

No one wrote a complete answer so I will try to incorporate the comments and partial answers into an answer. Feel free to comment or edit the answer to improve it.

For the purposes of this question and answer, userspace to kernel transitions mean a change in processor state that allows access to kernel code and memory. In short I will refer to these transistions as context switches.

When discussing events that can trigger userspace to kernel transitions, it is important to separate the OS constructs that we are used to (signals, system calls, scheduling) that require context switches and the way these constructs are implemented, using context switches.

In x86, there are two central ways for context switches to occur: interrupts and SYSENTER. Interrupts are a processor feature, which causes a context switch when certain events happen:

  • Hardware devices may request an interrupt, for example, a timer/clock can cause an interrupt when a certain amount of time has elapsed. A keyboard can interrupt when keys are pressed. It's also called a hardware interrupt.
  • Userspace can initiate an interrupt. For example, the old way to perform a system call in Linux on x86 was to execute INT 0x80 with arguments passed through the registers. Debugging breakpoints are also implemented using interrupts, with the debugger replacing an instruction with INT 0x3. This type of an interrupt is called a software interrupt.
  • The CPU itself generates interrupts in certain situations, like when memory is accessed without permissions, when a user divides by zero, or when one core must notify another core that it needs to do something. This type of interrupt is called an exception, and you can read more about them in @esm 's answer.
  • For a broader discussion of interrupts see here: http://wiki.osdev.org/Interrupt

SYSENTER is an instruction that provides the modern path to cause a context switch for the particular case of performing a system call.

The code that handles the context switching due to interrupts or SYSENTER in Linux can be found in arch/x86/kernel/entry_{32|64}.S.

There are many situations in which a higher-level Linux construct might cause a context switch. Here are a few examples:

  • If a system call got to int 0x80 or sysenter instruction, a context switch occurs. Some system call routines can use userspace information to get the information the system call was meant to get. In this case, no context switch will occur.
  • Many times scheduling doesn't require an interrupt: a thread will perform a system call, and the return from the syscall is delayed until it is scheduled again. For processses that are in a section where syscalls aren't performed, Linux relies on timer interrupts to gain control.
  • Virtual memory access to a memory location that was paged out will cause a segmentation fault, and therefore a context switch.
  • Signals are usually delivered when a process is already "switched out" (see comments by @caf on the question), but sometimes an inter-processor interrupt is used to deliver the signal between two running processes.