17
votes
void return_input (void)
{ 
   char array[30]; 

   gets (array); 
   printf("%s\n", array); 
}

After compiling it in gcc, this function is converted to the following Assembly code:

push   %ebp
mov    %esp,%ebp
sub    $0x28,%esp
mov    %gs:0x14,%eax
mov    %eax,-0x4(%ebp)
xor    %eax,%eax
lea    -0x22(%ebp),%eax
mov    %eax,(%esp)
call   0x8048374 
lea    -0x22(%ebp),%eax
mov    %eax,(%esp)
call   0x80483a4 
mov    -0x4(%ebp),%eax
xor    %gs:0x14,%eax
je     0x80484ac 
call   0x8048394 
leave  
ret  

I don't understand two lines:

mov    %gs:0x14,%eax
xor    %gs:0x14,%eax

What is %gs, and what exactly these two lines do?

This is compilation command:

cc -c -mpreferred-stack-boundary=2 -ggdb file.c
3
I suppose these are SS,DS,CS,ES,FS,GS - segment registers. If i got it right.Sergey Benner

3 Answers

24
votes

GS is a segment register, its use in linux can be read up on here (its basically used for per thread data).

mov    %gs:0x14,%eax
xor    %gs:0x14,%eax

this code is used to validate that the stack hasn't exploded or been corrupted, using a canary value stored at GS+0x14, see this.

gcc -fstack-protector=strong is on by default in many modern distros; you can use gcc -fno-stack-protector to not add those checks. (On x86, thread-local storage is cheap so GCC keeps the randomized canary value there, making it somewhat harder to leak.)

3
votes

ES, FS, GS: Extra Segment Registers Can be used as extra segment registers; also used in special instructions that span segments (like string copies). taken from here

http://www.hep.wisc.edu/~pinghc/x86AssmTutorial.htm


hope it helps

3
votes

In the AT&T style assembly languages, the percent sigil generally indicates a register. In x86 family processors from 386 onwards, GS is one of the so-called segment registers. However, in protected mode environments segment registers work as selector registers.

A virtual memory selector represents its own mapping of virtual address space together with its own access regime. In practical terms, %gs:0x14 can be thought of as a reference into an array whose origin is held in %gs (albeit the CPU does a bit of extra dereferencing). On modern GNU/Linux systems, %gs is usually used to point at the thread-local storage region. In the code you're asking about, however, only one item of the TLS matters — the stack canary.

The idea is to attempt to detect a buffer overflow error by placing a random but constant value — it's called a stack canary in memory of the canaries coal miners used to employ to signal increase in levels of poisonous gases by dying — into the stack before gets() gets called, above its stack frame, and check whether it is still there after gets() will have returned. gets() has no business overwriting this part of the stack — it is outside its own stack frame, and it is not given a pointer to it —, so if the stack canary has died, something has gone wrong in a dangerous way. (C as a programming environment happens to be particularly prone to this kind of wrong-goings, and security researchers have learnt to exploit many of them over the last twenty years or so. Also, gets() happens to be a function that is inherently at risk to overflow its target buffer.) You have not offered addresses with your code, but 0x80484ac is likely the address of leave, and the call 0x8048394 which is executed in case of mismatch (that is, jumped over by je 0x80484ac in case of match), is probably a call to __stack_chk_fail(), provided by libc to handle the stack corruption by fleeing the metaphorical poisonous mine.

The reason the canonical value of the stack canary is kept in the thread-local storage is that this way, every thread can have its own stack canary. Stacks themselves are normally not shared between threads, so it is natural to also not share the canary value.