In short, when I have multiple db
sections in my .data
section, the compiled addresses/labels are off when compiled by NASM. In my testing they are off by 256 bytes in the resulting Mach-O binary.
Software I am using:
- OS X 10.10.5
nasm
NASM version 2.11.08, installed via Homebrew as required for x84_64 ASMgobjdump
GNU objdump (GNU Binutils) 2.25.1, installed via Homebrewclang
Apple LLVM version 6.1.0 (clang-602.0.53) (based on LLVM 3.6.0svn)
What works:
Take for example the following "hello world" NASM assembly.
main.s
global _main
section .text
_main:
mov rax, 0x2000004
mov rdi, 1
lea rsi, [rel msg]
mov rdx, len
syscall
mov rax, 0x2000001
mov rdi, 0
syscall
section .data
msg: db "Hello, world!", 10
len: equ $ - msg
Compiled and run with:
/usr/local/bin/nasm -f macho64 -o main.o main.s
clang -o main main.o
./main
This works great, and produces the following output:
Hello, world!
What doesn't:
Now, to add another message, we just need to add another string to the data section, and another syscall
. Simple enough.
main.s
global _main
section .text
_main:
mov rax, 0x2000004
mov rdi, 1
lea rsi, [rel msga]
mov rdx, lena
syscall
mov rax, 0x2000004
mov rdi, 1
lea rsi, [rel msgb]
mov rdx, lenb
syscall
mov rax, 0x2000001
mov rdi, 0
syscall
section .data
msga: db "Hello, world!", 10
lena: equ $ - msga
msgb: db "Break things!", 10
lenb: equ $ - msgb
Compile and run, same as before, and we get:
Break things!
What?!? Shouldn't we be getting?:
Hello, world!
Break things!
What's wrong?:
Something clearly went wrong. Time to disassemble the resulting binary and see what we got.
$ gobjdump -d -M intel main
Produces the following for _main
:
0000000100000f7c <_main>:
100000f7c:b8 04 00 00 02 mov eax,0x2000004
100000f81:bf 01 00 00 00 mov edi,0x1
100000f86:48 8d 35 73 01 00 00 lea rsi,[rip+0x173] # 100001100 <msgb+0xf2>
100000f8d:ba 0e 00 00 00 mov edx,0xe
100000f92:0f 05 syscall
100000f94:b8 04 00 00 02 mov eax,0x2000004
100000f99:bf 01 00 00 00 mov edi,0x1
100000f9e:48 8d 35 69 00 00 00 lea rsi,[rip+0x69] # 10000100e <msgb>
100000fa5:ba 0e 00 00 00 mov edx,0xe
100000faa:0f 05 syscall
100000fac:b8 01 00 00 02 mov eax,0x2000001
100000fb1:bf 00 00 00 00 mov edi,0x0
100000fb6:0f 05 syscall
From the comment # 100001100 <msgb+0xf2>
, we can see that it is pointing not to the msga
symbol, but to 0xf2
past msgb
, or 100001100
(at this address there are null bytes, resulting in no output). Inspecting the binary in a hex editor, I find the actual msga
string at offset 1000
, or address 100001000
. The means that the address in the compiled binary is now off by 0x100
/256
bytes, simply because there is now a second db
label. What?!?
A sorry excuse for a workaround:
As an experiment, I decided to try putting both of the db
sections into separate ASM/object files, and linking all 3 together. Doing so works.
main.s
global _main
extern _msga
extern _lena
extern _msgb
extern _lenb
section .text
_main:
mov rax, 0x2000004
mov rdi, 1
lea rsi, [rel _msga]
mov rdx, _lena
syscall
mov rax, 0x2000004
mov rdi, 1
lea rsi, [rel _msgb]
mov rdx, _lenb
syscall
mov rax, 0x2000001
mov rdi, 0
syscall
msga.s
global _msga
global _lena
section .data
_msga: db "Hello, world!", 10
_lena: equ $ - _msga
msgb.s
global _msgb
global _lenb
section .data
_msgb: db "Break things!", 10
_lenb: equ $ - _msgb
Compile and run with:
/usr/local/bin/nasm -f macho64 -o main.o main.s
/usr/local/bin/nasm -f macho64 -o msga.o msga.s
/usr/local/bin/nasm -f macho64 -o msgb.o msgb.s
clang -o main msga.o msgb.o main.o
./main
Results in:
Hello, world!
Break things!
While this does work, I find it hard to believe this is the best solution.
What is going wrong?
Surely there must be a way to have multiple db
labels in one ASM file? Am I doing something wrong in the way I write the ASM? Is this a bug in NASM? Is this expected behavior somehow, in which case why? My workaround is extra work and clutter, so I would greatly appreciate any assistance in this.