2
votes

I am compiling a NASM 64-bit shared object in Linux using the NASM compiler and linking with ld. It compiles to an object file using the following string:

sudo nasm -felf64 Test_File.asm

I link with ld:

sudo ld -shared Test_File.o -o Test_File.so

and I get the following errors:

Relocation R_X86_64_32S against '.data' can not be used when making a shared object; recompile with -fPIC

ld: final link failed: Nonrepresentable section on output

Unfortunately, the NASM compiler does not have a -fPIC option.

After reading many resources on writing position-independent code for 64-bit shared libraries in Linux, I understand the issue very well, but I still don't have a clear idea of what instruction changes I need to make to be position-independent in 64-bit NASM. For example, do all instructions involving named variables need to be "rel" -- for example, movsd xmm0,[rel abc] instead of movsd xmm0,[abc]? I know that R_X86_64_32S indicates 32-bit addressing, but I don't have any 32-bit addressing in my code.

Also, there are significant differences between 32-bit and 64-bit on how position independent code is written, and some of the resources concentrate only on 32-bit code. Even Section 9.2 Writing NetBSD/FreeBSD/OpenBSD and Linux/ELF Shared Libraries in the NASM manual is not clear on how 64-bit code must be altered for position-independent code. That section focuses on 32-bit code (using the global offset table), which is not (based on other research) used for 64-bit code.

  1. The file is headed with [BITS 64] and [default rel], as required.

  2. The data section is declared as section .data align=16

  3. Every variable in the .data section is defined as dq, for example, number: dq 0.

  4. The top of the file contains the exports in this format: global ABC:function.

I suspect that only data movement instructions would be affected -- math instructions would not. For the external calls to realloc, I added the wrt ..plt special symbol, but I still get the same errors.

Here are my questions:

  1. Do all mov instructions need to be rewritten with the "rel" keyword, for example, mov rax,[rel abc] instead of mov rax,[abc]?

  2. Do lea instructions need to be changed (e.g., lea rdi,[rel abc])?

  3. Are there any other instruction types that need special handling?

I am not posting the entire (very long) nasm code listing here because I'm not looking for line-by-line analysis. I just want to know what instruction types (for example, mov, cmp, jmp, lea) need to be rewritten for 64-bit relative addressing, and how. Does it involve only accesses to variables defined in the data section (e.g., mov rcx,[abc] where abc is defined in the data section as abc: dq 0).

To summarize, my question is: what changes do I need to make for position-independent code for 64-bit NASM, since the NASM compiler does not have a fPIC option? I certainly don't mean line-by-line, but what types of instructions need to be added or rewritten.

Thanks very much.

1

1 Answers

2
votes

Unfortunately, the NASM compiler does not have a -fPIC option.

Of course not; that's a code-gen option for a compiler. NASM is an assembler, not a compiler; the instructions it assembles are set by the source file, not command-line options. (The error message is assuming people are using ld on compiler output, not hand-written asm.)

Recompile = redo generating the asm instructions, not reassemble the same asm with different options. The compiler is your brain.


  1. Do all mov instructions need to be rewritten with the "rel" keyword

No, you can just use default rel at the top of your file like a normal person instead of modifying every addressing mode to explicitly use [rel foo].

  1. and 3. Are there any other instruction types that need special handling?

It's nothing to do with the mov instruction, everything to do with the addressing mode. All instructions (including LEA) use the same ModR/M + optional SIB + disp0/8/32 encoding for addressing modes. (Except for one form of mov which can use a 64-bit absolute address when loading/storing AL/AX/EAX/RAX. But you don't want that either.)

You also need to avoid any use of addresses as 32-bit absolute immediate operands. So if you're putting an address into a register, you need a RIP-relative LEA instead of the more efficient 5-byte mov-immediate that you could use in position-dependent code.

;; putting a label address into a register
default rel
    mov edi, my_string     ; optimal in position-dependent executables on Linux
    lea rdi, [my_string]   ; optimal otherwise, best you can do for PIC/PIE

    mov rdi, my_string     ; Never use: 64-bit absolute is inefficient

The only time you should use 64-bit absolute addresses is in .data or .rodata for the contents of a jump table or other pointers to static addresses. Not in code; use RIP-relative instead.


Obviously you have to avoid [disp32 + reg] addressing modes like [array + rdi] or [array + rdx*4]. The only RIP-relative addressing mode is [RIP + rel32]; other modes still use the 32-bit displacement as a sign-extended 32-bit absolute value (so it can be a constant offset like 1024, not an address at all).

Mach-O 64-bit format does not support 32-bit absolute addresses. NASM Accessing Array (MachO64 never allows 32-bit absolute, so it's the same restrictions as a Linux/ELF PIC object)

I know that R_X86_64_32S indicates 32-bit addressing, but I don't have any 32-bit addressing in my code.

[abs foo] is a disp32 sign-extended to 64. That's why the relocation type is 32S. By contrast, mov edi, foo uses R_X86_64_32.

It's not 32-bit address-size, but the absolute address still has to be encoded as a 32-bit signed integer. Which isn't allowed in a PIE/PIC object which has to be relocatable anywhere in 64-bit address space.


Related:


PIC libraries are expected to support symbol interposition. See Sorry state of dynamic libraries on Linux for more.

If you want efficient internal access to your own global symbols (without going through GOT), you may need to define weak aliases for them that have "hidden" ELF visibility. Or simply put 2 labels at the same place, one global one hidden. See 7.9.5 elf Extensions to the GLOBAL Directive in the NASM manual:

   global   hashlookup:function hidden

Also, the NASM manual notes:

Declaring the type and size of global symbols is necessary when writing shared library code. For more information, see section 9.2.4.