15
votes

I have a general question about the linux scheduler and some other similar kernel system calls.

Is the linux scheduler considered a "process" and every call to the scheduler requires a context switch like its just another process?

Say we have a clock tick which interrupts the current running user mode process, and we now have to call the scheduler. Does the call to the scheduler itself provokes a context switch? Does the scheduler has its own set of registers and U-area and whatnot which it has to restore at every call?

And the said question applies to many other system calls. Do kernel processes behave like regular processes in regard to context switching, the only difference is that they have more permissions and access to the cpu?

I ask this because context switch overhead is expensive. And it sounds odd that calling the scheduler itself provokes a context switch to restore the scheduler state, and after that the scheduler calls another process to run and again another context switch.

3
Actually there was a question :- "How Dispatcher (Scheduler) works ?" I got an answer from one of my Senior. See my comment : stackoverflow.com/questions/10838901/…Shekhar Kumar

3 Answers

5
votes

That's a very good question, and the answer to it would be "yes" except for the fact that the hardware is aware of the concept of an OS and task scheduler.

In the hardware, you'll find registers that are restricted to "supervisor" mode. Without going into too much detail about the internal CPU architecture, there's a copy of the basic program execution registers for "user mode" and "supervisor mode," the latter of which can only be accessed by the OS itself (via a flag in a control register that the kernel sets which says whether or not the kernel or a user mode application is currently running).

So the "context switch" you speak of is the process of swapping/resetting the user mode registers (instruction register, stack pointer register, etc.) etc. but the system registers don't need to be swapped out because they're stored apart from the user ones.

For instance, the user mode stack in x86 is USP - A7, whereas the supervisor mode stack is SSP - A7. So the kernel itself (which contains the task scheduler) would use the supervisor mode stack and other supervisor mode registers to run itself, setting the supervisor mode flag to 1 when it's running, then perform a context switch on the user mode hardware to swap between apps and setting the supervisor mode flag to 0.

But prior to the idea of OSes and task scheduling, if you wanted to do a multitasking system then you'd have had to use the basic concept that you outlined in your question: use a hardware interrupt to call the task scheduler every x cycles, then swap out the app for the task scheduler, then swap in the new app. But in most cases the timer interrupt would be your actual task scheduler itself and it would have been heavily optimized to make it less of a context switch and more of a simple interrupt handler routine.

1
votes

Actually you can check the code for the schedule() function in kernel/sched.c. It is admirably well-written and should answer most of your question.

But bottom-line is that the Linux scheduler is invoked by calling schedule(), which does the job using the context of its caller. Thus there is no dedicated "scheduler" process. This would make things more difficult actually - if the scheduler was a process, it would also have to schedule itself!

When schedule() is invoked explicitly, it just switches the contexts of the caller thread A with the one of the selected runnable thread B such as it will return into B (by restoring register values and stack pointers, the return address of schedule() will become the one of B instead of A).

1
votes

Here is an attempt at a simple description of what goes on during the dispatcher call:

  1. The program that currently has context is running on the processor. Registers, program counter, flags, stack base, etc are all appropriate for this program; with the possible exception of an operating-system-native "reserved register" or some such, nothing about the program knows anything about the dispatcher.
  2. The timed interrupt for dispatcher function is triggered. The only thing that happens at this point (in the vanilla architecture case) is that the program counter jumps immediately to whatever the PC address in the BIOS interrupt is listed as. This begins execution of the dispatcher's "dispatch" subroutine; everything else is left untouched, so the dispatcher sees the registers, stack, etc of the program that was previously executing.
  3. The dispatcher (like all programs) has a set of instructions that operate on the current register set. These instructions are written in such a way that they know that the previously executing application has left all of its state behind. The first few instructions in the dispatcher will store this state in memory somewhere.
  4. The dispatcher determines what the next program to have the cpu should be, takes all of its previously stored state and fills registers with it.
  5. The dispatcher jumps to the appropriate PC counter as listed in the task that now has its full context established on the cpu.

To (over)simplify in summary; the dispatcher doesn't need registers, all it does is write the current cpu state to a predetermined memory location, load another processes' cpu state from a predetermined memory location, and jumps to where that process left off.