x86 - GDT - Can GCC handle different segments (for code and data)?

Question

I have a generic question about the GDT and GCC. Have started to write an OS Kernel for the purpose of learning. Currently I use GCC for compiling the code. In several tutorials for setting up the GDT, the same base addresses (0) and limits (0xFFFFF) are used for code and data segment. My first thought as a x86 newbie was, if I use different base addresses and limits it's could be an additional protection.

Here is what I have tried (shortened):

Linker script:

ENTRY(_start)

SECTIONS {
    . = 1M;

    _kern_start = .;

    .text  ALIGN(4096) :  {
        _kern_text_start = .;
        *(.multiboot)
        *(.text)
        _kern_text_end = .;

    }

    .rodata ALIGN(4096) : {
        _kern_rodata_start = .;
        *(.rodata)
        _kern_rodata_end = .;
    }




    .data  ALIGN(4096):  {
        _kern_data_start = .;
        *(.data)
        _kern_data_end = .;
    }



    .bss ALIGN(4096) :  {
        _kern_bss_start = .;
        *(.bss)
        _kern_bss_end = .;
    }




    .stack ALIGN(4096) :  {
        _kern_stack_start = .;
        *(.stack)
        _kern_stack_end = .;
    }



    .heap  ALIGN(4096) :  {
        _kern_heap_start = .;
        *(.heap)
        _kern_heap_end = .;
    }

    _kern_end = .;
}

I added symbols for each section, then I wrote simple Assembler functions to get the start address and size for each section which I called within C:

Assembler functions (as example):

FUNCTION(_kern_text_get_addr)
    pushl %ebp
    movl %esp, %ebp
    movl $_kern_text_start, %eax
    leave
    ret

FUNCTION(_kern_text_get_size)
    pushl %ebp
    movl %esp, %ebp
    movl $_kern_text_start, %ebx
    movl $_kern_text_end, %eax
    sub %ebx, %eax
    leave
    ret

I used the different sections to setup the code and data (not shown in the following code snippet) segment in the GDT:

uint32_t base;
uint32_t limit;

base = _kern_text_get_addr();
limit = _kern_text_get_size() / 4096;

/* Kernel Code */
gdt_set_entry(&gdt[GDT_KERN_CODE], base, limit, GDT_ACCESS_EXEC | 
                                                GDT_ACCESS_SEGMENT | 
                                                GDT_ACCESS_RING0 | 
                                                GDT_ACCESS_PRESENT, 
                                                GDT_FLAG_SIZE | 
                                                GDT_FLAG_GRAN);

Loading with the Assembler instruction lgdt works. But when I flush the segment registers with a long jump, I got a General Protection (#GP) fault. So I examined the generated machine code. The issue is when I compile it with GCC with default options the jump address of long jump instruction isn't correct. It needs some kind of address translation to jump to the correct location. Also the data segments use wrong addresses. Okay, I can use paging instead but even if the question sounds stupid:

Is it even possible to use different segments for code and data with GCC or in other words, can GCC handle different segments? I know there is the PIC argument in GCC, but not tried it yet.

Nothing GCC does or could do precludes this, but use of segmentation is really backwards. It's a design mistake Intel made in the 80s that should have died back then. — R.. GitHub STOP HELPING ICE
I don't think so. The option -msplit is still in the pdp11 codegen, which does the same thing. The issue you are likely to run into is when the compiler generates something like a switch table that it has to read -- it will likely be stuffed in .text, but the codegen doesn't issue a %cs relative load. At least, that was a problem was in gcc circa 1996. The free watcom compiler handles this. — mevets

Ross Ridge Ross Ridge · Accepted Answer · 2020-03-06T14:45:36

With one minor exception, GCC can handle separate code and data segments without issue. The only thing I know of that breaks are the trampolines created when the address of a nested function is taken. These trampolines are created on the stack but are executed as code, so won't work if code needs to be in an different segment than the data. Since nested functions are a rarely used GCC extension this shouldn't cause problems in practice. You can still get it to work by providing a means to dynamically allocate and initialize memory in the code segment at run time.

However there's no advantage in doing this. You'd have to divide the 4G 32-bit linear address space into two separate non-overlapping code and data segments to get any kind of security advantage. However, you can get the same security advantage by using no-execute page protection bits without having to fragment the linear address space. This is why all current operating systems use a flat segmentation model, with both the code and data segments having a base 0 and a limit of 4G for 32-bit code. For the same reason, 64-bit x86 CPUs give you no option but to use the flat model in 64-bit mode.

Position independent code isn't affected by segmentation, and whatever problem you're having with the long jump (far jump) instruction won't be solved by using separate code and data segments.

x86 - GDT - Can GCC handle different segments (for code and data)?

1 Answers