12
votes

I'm interested in building a static ELF program without (g)libc, using unistd.h provided by the Linux headers.

I've read through these articles/question which give a rough idea of what I'm trying to do, but not quite: http://www.muppetlabs.com/~breadbox/software/tiny/teensy.html

Compiling without libc

https://blogs.oracle.com/ksplice/entry/hello_from_a_libc_free

I have basic code which depends only on unistd.h, of which, my understanding is that each of those functions are provided by the kernel, and that libc should not be needed. Here's the path I've taken that seems the most promising:

    $ gcc -I /usr/include/asm/ -nostdlib grabbytes.c -o grabbytesstatic
    /usr/bin/ld: warning: cannot find entry symbol _start; defaulting to 0000000000400144
    /tmp/ccn1mSkn.o: In function `main':
    grabbytes.c:(.text+0x38): undefined reference to `open'
    grabbytes.c:(.text+0x64): undefined reference to `lseek'
    grabbytes.c:(.text+0x8f): undefined reference to `lseek'
    grabbytes.c:(.text+0xaa): undefined reference to `read'
    grabbytes.c:(.text+0xc5): undefined reference to `write'
    grabbytes.c:(.text+0xe0): undefined reference to `read'
    collect2: error: ld returned 1 exit status

Before this, I had to manually define SEEK_END and SEEK_SET according to the values found in the kernel headers. Else it would error saying that those were not defined, which makes sense.

I imagine that I need to link into an unstripped vmlinux to provide the symbols to utilize. However, I read through the symbols and while there were plenty of llseeks, they were not llseek verbatim.

So my question can go in a few directions:

How can I specify an ELF file to utilize symbols from? And I'm guessing if/how that's possible, the symbols won't match up. If this is correct, is there an existing header file which will redefine llseek and default_llseek or whatever is exactly in the kernel?

Is there a better way to write Posix code in C without a libc?

My goal is to write or port fairly standard C code using (perhaps solely) unistd.h and invoke it without libc. I'm probably okay without a few unistd functions, and am not sure which ones exist "purely" as kernel calls or not. I love assembly, but that's not my goal here. Hoping to stay as strictly C as possible (I'm fine with a few external assembly files if I have to), to allow for a libc-less static system at some point.

Thank you for reading!

2
I initially thought you wanted to use this static binary from userspace (in which case the answer is that you need syscall wrappers if you want to use syscalls, either from libc or else write your own). But then you mentioned linking against the (unstripped) kernel so I guess you expect to run this code directly on bare metal (i.e. instead of the kernel). Please clarify your question on this point.Celada
Thanks for replying! I meant to link using the kernel as a symbol table reference and run it in the userland of the Linux host. I'll search for existing syscall wrappers and see if one is similar to what I'm trying to do.sega01
OK, well if you intend to run it in userspace then you can't link against the kernel (if successful, that would pull the in-kernel implementations of those system calls into your code, which is not what you want: you want to call into the kernel). You must implement open() and read() yourself by invoking the proper actions as specified by the kernel ABI, which usually involves setting up registers and then executing some kind of CPU trap instruction. The problem is that the details of this are EXTREMELY architecture-specific (ARM vs. x86, etc...) and complicated by things like vsyscalls.Celada
I don't see the point of doing this. Use libc - there is not much overhead if you are not using complicated functions - use -static, and you will get a binary that contains only the functions you want. What exactly is the purpose of not using libc? Note that you can't call the kernel from usermode without some sort of syscall wrapping - as you do need the appropriate calling method to transition from user-mode to kernel mode - this can not be done in pure C, needs to be written in assembler for the appropriate processor [and is subject to change if the kernel changes].Mats Petersson
I guess that the ideal scenario would be inline assembly header files. I found this question and it produces a result which is almost what I want, but I can't get argc/argv working with void _start(). @MatsPetersson: There's a lot of overhead with glibc, it results in 800KB or larger files, even if only using unistd.h. As far as I know, everything in my code is just syscalls, so I don't see why I can't simply have gcc generate code calling those directly through the Linux headers.sega01

2 Answers

6
votes

If you're looking to write POSIX code in C, the abandonment of libc is not going to be helpful. Although you could implement a syscall function in assembler, and copy structures and defines from the kernel header, you would essentially be writing your own libc, which almost certainly would not be POSIX compliant. With all the great libc implementations out there, there's almost no reason to begin implementing your own.

dietlibc and musl libc are both frugal libc implementations which yield impressively small binaries The linker is generally smart; as long as a library is written to avoid the accidentally pulling in numerous dependencies, only the functions you use will actually be linked into your program.

Here is a simple hello world program:

#include<unistd.h>

int main(){
    char str[] = "Hello, World!\n";
    write(1, str, sizeof str - 1);
    return 0;
}

Compiling it with musl below yeilds a binary of a less than 3K

$ musl-gcc -Os -static hello.c
$ strip a.out 
$ wc -c a.out
2800 a.out

dietlibc produces an even smaller binary, less than 1.5K:

$ diet -Os gcc hello.c
$ strip a.out 
$ wc -c a.out
1360 a.out
4
votes

This is far from ideal, but a little bit of (x86_64) assembler has me down to just under 5KB (but most of that is "other things than code" - the actual code is under 1KB [771 bytes to be precise], but the file size is much larger, I think because the code size is rounded to 4KB, and then some header/footer/extra stuff is added to that]

Here's what I did: gcc -g -static -nostdlib -o glibc start.s glibc.c -Os -lc

glibc.c contains:

#include <unistd.h>

int main()
{
    const char str[] = "Hello, World!\n";
    write(1, str, sizeof(str));

    _exit(0);
}

start.s contains:

    .globl _start
_start: 
    xor %ebp, %ebp
    mov %rdx, %r9
    mov %rsp, %rdx
    and $~16, %rsp
    push    $0
    push    %rsp

    call    main

    hlt


    .globl _exit
_exit:
    //  We known %RDI already has the exit code... 
    mov $0x3c, %eax
    syscall
    hlt

That main point of this is not to show that it's not the system call part of glibc that takes up a lot of space, but the "prepar things" - and beware that if you were to call for example printf, possibly even (v)sprintf, or exit(), or any other "standard library" function, you are in the land of "nobody knows what will happen".

Edit: Updated "start.s" to put argc/argv in the right places:

_start: 
    xor %ebp, %ebp
    mov %rdx, %r9
    pop     %rdi
    mov %rsp, %rsi
    and $~16, %rsp
    push    %rax
    push    %rsp

    // %rdi = argc, %rsi=argv
    call    main

Note that I've changed which register contains what thing, so that it matches main - I had them slightly wrong order in the previous code.