Linux 64 bit context switch

Question

in the switch_to macro in 32 bit mode, there the following code is executed before the __switch_to function is called:

asm volatile("pushfl\n\t"       /* save    flags */ \
         "pushl %%ebp\n\t"      /* save    EBP   */ \
         "movl %%esp,%[prev_sp]\n\t"    /* save    ESP   */ \
         "movl %[next_sp],%%esp\n\t"    /* restore ESP   */ \
         "movl $1f,%[prev_ip]\n\t"  /* save    EIP   */ \
         "pushl %[next_ip]\n\t" /* restore EIP   */ \
         __switch_canary                    \
         "jmp __switch_to\n"    /* regparm call  */

The EIP is pushed onto the stack (restore EIP). When __switch_to finishes, there is a ret which returns to that location. Here is the corrsponding 64 bit code:

    asm volatile(SAVE_CONTEXT                     \
     "movq %%rsp,%P[threadrsp](%[prev])\n\t" /* save RSP */   \
     "movq %P[threadrsp](%[next]),%%rsp\n\t" /* restore RSP */    \
     "call __switch_to\n\t"

There, only the rsp is saved and restored. I think that the RIP is already at the top of stack. But I cannot find the instruction where that is done. How is the 64 bit context switch, especially for the RIP register, actually done?

Thanks in advance!

Jester Jester · Accepted Answer · 2015-03-16T14:07:01

In 32 bit kernel, the thread.ip may be one of:

the 1 label in switch_to
ret_from_fork
ret_from_kernel_thread

The return to the proper place is ensured by simulating a call using a push + jmp pair.

In 64 bit kernel, thread.ip is not used like this. Execution always continues after the call (which used to be the 1 label in the 32 bit case). As such, there is no need to emulate the call, it can be done normally. Dispatching to ret_from_fork happens using a conditional jump after __switch_to returns (you have omitted this part):

#define switch_to(prev, next, last) \
        asm volatile(SAVE_CONTEXT                                         \
             "movq %%rsp,%P[threadrsp](%[prev])\n\t" /* save RSP */       \
             "movq %P[threadrsp](%[next]),%%rsp\n\t" /* restore RSP */    \
             "call __switch_to\n\t"                                       \
             "movq "__percpu_arg([current_task])",%%rsi\n\t"              \
             __switch_canary                                              \
             "movq %P[thread_info](%%rsi),%%r8\n\t"                       \
             "movq %%rax,%%rdi\n\t"                                       \
             "testl  %[_tif_fork],%P[ti_flags](%%r8)\n\t"                 \
             "jnz   ret_from_fork\n\t"                                    \
             RESTORE_CONTEXT                                              \

The ret_from_kernel_thread is incorporated into the ret_from_fork path, using yet another conditional jump in entry_64.S:

ENTRY(ret_from_fork)
        DEFAULT_FRAME

        LOCK ; btr $TIF_FORK,TI_flags(%r8)

        pushq_cfi $0x0002
        popfq_cfi                               # reset kernel eflags

        call schedule_tail                      # rdi: 'prev' task parameter

        GET_THREAD_INFO(%rcx)

        RESTORE_REST

        testl $3, CS-ARGOFFSET(%rsp)            # from kernel_thread?
        jz   1f

Linux 64 bit context switch

1 Answers