Segfault inline assembly

Question

I'm trying to create a green thread implementation based off this tutorial, However my switch function is giving me a segfault because the code to load the registers is not run at the end of the function. Here is my code:

void ThreadSwitch(Thread in, Thread out) {
    if (!out && !in) {
            return;
    }
    if (out) {
        // save registers for out    
    }
    if (in) {
        SetCurrentThread(in);
        mtx_lock(&in->mutex);
        uint64_t rsp = in->cpu.rsp;
        uint64_t r15 = in->cpu.r15;
        uint64_t r14 = in->cpu.r14;
        uint64_t r13 = in->cpu.r13;
        uint64_t r12 = in->cpu.r12;
        uint64_t rbx = in->cpu.rbx;
        uint64_t rbp = in->cpu.rbp;
        mtx_unlock(&in->mutex);
        asm volatile("mov %[rsp], %%rsp\n"
                     "mov %[r15], %%r15\n"
                     "mov %[r14], %%r14\n"
                     "mov %[r13], %%r13\n"
                     "mov %[r12], %%r12\n"
                     "mov %[rbx], %%rbx\n"
                     "mov %[rbp], %%rbp\n" : : [rsp] "r"(rsp), [r15] "r"(r15), [r14] "r"(r14), [r13] "r"(r13), [r12] "r"(r12), [rbx] "r"(rbx), [rbp] "r"(rbp));
    }
}

Xcode says that the inline assembly is causing a segfault, but my lldb disassembly looks like this (you can ignore 95% of it, just provided for context):

   0x1000f88b4:  movq   -0x8(%rbp), %rdi
   0x1000f88b8:  callq  0x1000f83a0               ; SetCurrentThread at thread.cc:21
   0x1000f88bd:  movq   -0x8(%rbp), %rdi
   0x1000f88c1:  addq   $0x50, %rdi
   0x1000f88c8:  callq  0x1000f7b80               ; mtx_lock at tct.c:106
   0x1000f88cd:  movq   -0x8(%rbp), %rdi
   0x1000f88d1:  movq   (%rdi), %rdi
   0x1000f88d4:  movq   %rdi, -0x18(%rbp)
   0x1000f88d8:  movq   -0x8(%rbp), %rdi
   0x1000f88dc:  movq   0x8(%rdi), %rdi
   0x1000f88e0:  movq   %rdi, -0x20(%rbp)
   0x1000f88e4:  movq   -0x8(%rbp), %rdi
   0x1000f88e8:  movq   0x10(%rdi), %rdi
   0x1000f88ec:  movq   %rdi, -0x28(%rbp)
   0x1000f88f0:  movq   -0x8(%rbp), %rdi
   0x1000f88f4:  movq   0x18(%rdi), %rdi
   0x1000f88f8:  movq   %rdi, -0x30(%rbp)
   0x1000f88fc:  movq   -0x8(%rbp), %rdi
   0x1000f8900:  movq   0x20(%rdi), %rdi
   0x1000f8904:  movq   %rdi, -0x38(%rbp)
   0x1000f8908:  movq   -0x8(%rbp), %rdi
   0x1000f890c:  movq   0x28(%rdi), %rdi
   0x1000f8910:  movq   %rdi, -0x40(%rbp)
   0x1000f8914:  movq   -0x8(%rbp), %rdi
   0x1000f8918:  movq   0x30(%rdi), %rdi
   0x1000f891c:  movq   %rdi, -0x48(%rbp)
   0x1000f8920:  movq   -0x8(%rbp), %rdi
   0x1000f8924:  addq   $0x50, %rdi
   0x1000f892b:  movl   %eax, -0x54(%rbp)
   0x1000f892e:  callq  0x1000f7de0               ; mtx_unlock at tct.c:264
   0x1000f8933:  movq   -0x18(%rbp), %rdi         ; beginning of inline asm 
   0x1000f8937:  movq   -0x20(%rbp), %rcx
   0x1000f893b:  movq   -0x28(%rbp), %rdx
   0x1000f893f:  movq   -0x30(%rbp), %rsi
   0x1000f8943:  movq   -0x38(%rbp), %r8
   0x1000f8947:  movq   -0x40(%rbp), %r9
   0x1000f894b:  movq   -0x48(%rbp), %r10
   0x1000f894f:  movq   %rdi, %rsp
   0x1000f8952:  movq   %rcx, %r15
   0x1000f8955:  movq   %rdx, %r14
   0x1000f8958:  movq   %rsi, %r13
   0x1000f895b:  movq   %r8, %r12
   0x1000f895e:  movq   %r9, %rbx
   0x1000f8961:  movq   %r10, %rbp                ; end of inline asm
-> 0x1000f8964:  movl   %eax, -0x58(%rbp)
   0x1000f8967:  addq   $0x60, %rsp
   0x1000f896b:  popq   %rbp
   0x1000f896c:  retq

The segfault happens when it tries to access stuff back on the stack, which makes sense because it just switched out the stack. But why is the compiler inserting this? The compiler also stores %eax on the stack at 0x1000f892b. Is the compiler opening up a register? Because it doesn't use %rax in the inline asm. Is there a workaround?

This is using Apple LLVM version 6.0 (clang-600.0.57) on OSX 10.10.2, if that's any help.

Thanks in advance.

It isn't obvious what the compiler is doing with eax there (maybe optimizations disabled?), however the address -0x58(%rbp) should be valid since you are switching back to a thread that itself executed the code that set up %rbp earlier. — Jester
I'll try making sure optimizations are disabled (I'm using CMake's Debug setting), but this code is failing the first time a thread is getting run. The stack on the thread is setup so that this function (ThreadSwitch) will return into the thread's target function. — Jack Maloney
It may be reasonable to mark this function as "naked", so compiler won't generate stack-frame instructions before and after function's body (prologue and epilogue). GCC has __attribute__((naked)) for this, maybe CLang guys added it as backward compability. But this also makes you hard to use local variables, because, again you didn't change stack pointer and will overwrite caller variables. — myaut

Timothy Baldwin Timothy Baldwin · Accepted Answer · 2015-04-14T22:23:17

I strongly advise you not to write programs that depend on undefined behaviour.

Jumps into and out of inline assembly are not permitted as the compiler can't analyse control flow it doesn't know about, upon thread creation you jump into the asm statement from nowhere then leaves it. To avoid these implicit jumps you need to save and restore the registers including %rip in the same asm statement.

All registers that an asm statement alters must be listed as outputs or clobbers, for a thread switch routine that is all the registers whose values are not saved, as they are altered by the other threads. If you do not do so the compiler will incorrectly assume that they are not altered.

An asm statement must avoid overwriting it's inputs before they are used, in your code there is nothing prohibiting the compiler from storing the variable r12 in the register %r14.

Your lock is either pointless or inadequate.

It is much simpler to write your function entirely in assembly, like in tutorial you cite.

Segfault inline assembly

1 Answers