
I have the following code which I'm using with clang on macOS:

.intel_syntax noprefix


hello:  .ascii  "Hello world\n"
hello_len = . - hello


.globl  _main

        mov     rax, 0x2000004
        mov     rdi, 1
        lea     rsi, [rip + hello]
        mov     rdx, hello_len       # <-------

        mov     rax, 0x2000001

While it looks like it should print "Hello World" and exit, it actually segfaults. It turns out it's because mov rdx, hello_len actually tries to move the value that is at address hello_len, not the value of hello_len itself.

If I used AT&T syntax, the line would be movq $hello_len, %rdx which works properly. What's the equivalent in clang's version of GAS intel syntax?

You should report this clang bug on bugs.llvm.org. At least the fact it can't compile its own output.Peter Cordes

2 Answers


With real GAS (on Linux), your code assembles to a mov rdx, sign_extended_imm32 like you want.

But yes, clang assembles it to mov rdx, [0xc] unfortunately. That may or may not be a bug, but it's definitely an incompatibility. (MacOS's gcc command is not the GNU Compiler Collection at all, it's Apple Clang: LLVM backend, clang frontend, absolutely nothing to do with the GNU project.)

OFFSET hello_len doesn't seem to work. (I had incorrectly assumed it would on first guess, but clang doesn't support the OFFSET operator; it's .intel_syntax is not fully usable.)

This is clang bug has already been reported. See also Why does this simple assembly program work in AT&T syntax but not Intel syntax?

Clang can't even assemble its own .intel_syntax noprefix output.
There may not be a way to get clang Intel syntax to use a symbol's value (address) as an immediate.

// hello.c
char hello[] = "abcdef";
char *foo() { return hello; }

clang -S prints mov edi, offset hello which won't assemble with clang's built-in assembler! https://godbolt.org/z/x7vmm4.

$ clang -fno-pie -O1 -S -masm=intel hello.c
$ clang -c hello.s
hello.s:10:18: error: cannot use more than one symbol in memory operand
        mov     eax, offset hello
$ clang --version
clang version 8.0.1 (tags/RELEASE_801/final)
Target: x86_64-pc-linux-gnu

IMO this is a bug, you should report it on clang's https://bugs.llvm.org

(Linux non-PIE executables can take advantage of static addresses being in the low 32 bits of virtual address space by using mov r32, imm32 instead of RIP-relative LEA. And of course not mov r64, imm64.)

Workarounds: you can't just use the C preprocessor. . - hello is context-sensitive; it has a different value when . is a different position. So a text substitution wouldn't work.

Ugly Workaround: switch to .att_syntax and back:

Switch to .att_syntax and back for mov $hello_len, %edx

Ugly and inefficient workaround: lea

This won't work for 64-bit constants, but you can use lea to put a symbol address into a register.

Unfortunately clang/LLVM always uses a disp32 addressing mode, even for register + small constant, when the small constant is a named symbol. I guess it really is treating it like an address that might have a relocation.

Given this source:

##  your .rodata and  =  or .equ symbol definitions

        mov     eax, 0x2000004             # optimized from RAX
        mov     edi, 1
        lea     rsi, [rip + hello]
        mov     edx, hello_len             # load
        lea     edx, [hello_len]           # absolute disp32
        lea     edx, [rdi-1 + hello_len]   # reg + disp8 hopefully
#       mov     esi, offset hello          # clang chokes.
#        mov     rdx, OFFSET FLAT hello_len       # clang still chokes
       lea    -1+hello_len(%rdi), %edx
       lea    -1+12(%rdi), %edx
       mov    $hello_len, %edx
.intel_syntax noprefix

        mov     rax, 0x2000001

clang assembles it to this machine code, as disassembled by objdump -drwC -Mintel. Note that the LEA needs a ModRM + SIB to encode a 32-bit absolute addressing mode in 64-bit code.

   0:   b8 04 00 00 02          mov    eax,0x2000004       # efficient 5-byte mov r32, imm32
   5:   bf 01 00 00 00          mov    edi,0x1
                                                            # RIP-relative LEA
   a:   48 8d 35 00 00 00 00    lea    rsi,[rip+0x0]        # 11 <_main+0x11>   d: R_X86_64_PC32        .data-0x4

  11:   8b 14 25 0c 00 00 00    mov    edx,DWORD PTR ds:0xc   # the load we didn't want
  18:   8d 14 25 0c 00 00 00    lea    edx,ds:0xc             # LEA from the same [disp32] addressing mode.
  1f:   8d 97 0b 00 00 00       lea    edx,[rdi+0xb]          # [rdi+disp32] addressing mode, missed optimization to disp8
  25:   8d 97 0b 00 00 00       lea    edx,[rdi+0xb]          # AT&T lea    -1+hello_len(%rdi), %edx same problem
  2b:   8d 57 0b                lea    edx,[rdi+0xb]          # AT&T with lea hard-coded -1+12(%rdi)
  2e:   ba 0c 00 00 00          mov    edx,0xc                # AT&T mov    $hello_len, %edx

  33:   0f 05                   syscall 
  35:   48 c7 c0 01 00 00 02    mov    rax,0x2000001          # inefficient mov r64, sign_extended_imm32 from your source
  3c:   0f 05                   syscall 

GAS assembling the same source makes 8d 57 0b lea edx,[rdi+0xb] for the lea edx, [rdi-1 + hello_len] version.

See https://codegolf.stackexchange.com/questions/132981/tips-for-golfing-in-x86-x64-machine-code/132985#132985 - LEA from a known-constant register is a win for code-size with nearby / small constants, and is actually fine for performance. (As long as the known-constant got that way without a dependency on a long chain of calculations).

But as you can see, clang fails to optimize that and still uses a reg+disp32 addressing mode even when the displacement would fit in a disp8. It's still slightly better code-size than [abs disp32] which requires a SIB byte; without a SIB byte that encoding means [RIP + rel32].


If you change your opcode to:

lea rax, hello_len

it works. In old unix as, =, or the more verbose .set, operated on lvalues. In this reality, hello_len is an address; specifically the address 12.

I can't recall = in masm syntax. I recall equ serving a similar purpose, but it was all poorly specified. We mainly used the cpp (and occasionally awk) to do the lifting for us, and avoided the asm features.