1
votes

I am trying to understand how User/Kernel boundaries work in Operative Systems.

I've been reading about it and seems like if something at the User levels tries to perform a forbidden action, the hardware would trigger a trap and send the control back to the OS at a Kernel Level, and the Kernel would deal with this situation.

How is that even possible? How can something go directly from the User level to Hardware, aren't all the interactions made through System calls (that are at the Kernel level)? Then the Kernel would anticipate whether the action is illegal or not.

I am slightly confused, I think a real example (with a user level application + OS such as linux) of how this flow works might help me to understand, if someone can do it, I would really really appreciate it.

1

1 Answers

2
votes

How can something go directly from the User level to Hardware, aren't all the interactions made through System calls (that are at the Kernel level)?

Many interactions aren't made through system calls. For a practical example; on 80x86 things that can cause a switch from user to kernel are:

a) Any interrupt, which includes:

  • any exception (debug exception, divide exception, invalid opcode exception, general protection fault, page fault, machine check exception, ... ). These indicate programming bugs (e.g. division by zero), or hardware failures (machine check exception), or opportunities for kernel to extend functionality (e.g. page fault indicating kernel needs to fetch data from swap space because kernel is extending the amount of "memory" programs can use, etc).

  • any IRQ (from a device asking for attention)

  • any interrupt sent from (software/kernel running on) one CPU to another CPU

  • software interrupts (from same CPU)

b) Certain special instructions (typically used for kernel API entry point/s):

  • SYSCALL, SYSENTER

  • a "call far" or "jmp far" that involves either a call gate or a task gate

  • software interrupts (from same CPU). Mentioned twice because it fits in both categories.

Note that a kernel may provide multiple entry points for various reasons (e.g. one for 64-bit processes and another for 32-bit processes, or maybe one for all processes and another that can only be used by special/trusted processes, or one that's fast but large for frequently executed code and another that's small but slower for reducing code size in infrequently executed code, or ...); and may use an interrupt (e.g. an exception - I've used "breakpoint exception" before) instead of, or in addition to, a special instruction for a kernel API.

All of these things share 2 special characteristics - they all cause/allow the "boundary between user-level and kernel" to be crossed; and the location of the code that control is passed to is determined by the kernel and not the caller (user level code) so that the kernel can defend/secure all of its entry points.

Also note that in some of these cases (IRQs, machine check exception) hardware is responsible and not software. This is important/necessary to ensure hardware receives attention in a timely manner (e.g. a malicious "denial of service" process can't just do an infinite loop to prevent the kernel from using the CPU, which is what you'd get if system calls were the only way to cross the user/kernel boundary).

Finally; almost all CPUs that are intended for general purpose use (excluding some tiny microprocessors that you find embedded in things like microwave ovens) have similar capabilities to 80x86 (it's often just lower level details of how it's implemented that vary).

For an example of what might happen when a (user) process is running; the program might execute some code (and then try to access some data that isn't actually in RAM causing a page fault where kernel fetches the data that the program wants to access from a memory mapped file), then the program might execute some more code (triggering an invalid opcode exception where kernel emulates a newer instruction that isn't actually supported on the current/older CPU), then the program might execute some more code (but be interrupted by an IRQ from a network card where kernel/device driver arranges for some more TCP/IP packets to be sent/received), then the program might execute some more code (but be interrupted by a timer that causes kernel to do some task switches to allow other processes to have some CPU time), then the program might execute some more code (and be interrupted by CPU telling kernel that its getting hot where kernel might migrate the process to a different cooler CPU); and the program itself won't be aware that any of these things happened (and will just think it's been using a CPU the whole time, when it hasn't).