0
votes

I cannot find a definitive answer for how to correctly implement error handling for system calls in x86 assembly, under Linux.

Shows that, returning from a syscall, register rax contains the result of the system-call. A value in the range between -4095 and -1 indicates an error.

Logically then, if we ‘look’ at the return value in rax, and the value lies within this range, we can conclude that an error has occurred, and we can act on this information appropriately.

However, how do we know that the return value is indeed a negative value? This is the basis for my question really, as my understanding is such that, the value of a given binary pattern is only a negative value if we treat it as such.

For example, let’s assume, for purposes of illustration, that the return value from a system call is ‘-4000’. Now, the actual return value passed to rax is not literally -4000, but a binary pattern that can be interpreted as such. In the case of interpreting a return value from a system call, how do we differentiate between one possible interpretation of this binary pattern and the other; in other words, how do we know that the return value (binary pattern) does not represent the unsigned equivalent?

Admittedly this would be a relatively large number. However, is this not a legitimately plausible scenario? After all, rax contains only bits, whether or not we treat those bits as representing a negative number is down to interpretation/implementation?

The two examples I have found so far, that illustrate error handling for system calls in x86 asm (one , two) approach the problem in a similar manner. First they execute a ‘phantom’ operation, such as or eax,eax, to set the appropriate flags, next they test the condition of the sign flag (SF), to see if the sign bit is set, and act accordingly.

Again, I do not understand how we can determine from the SF, that the number is in fact negative; it is simply indicating that the sign bit (most significant bit) has been set.

As an example, let’s assume our code implemented a system call, with a return value of 0x8000 0000 0000 0000h:

    mov rax,8000000000000000h   ; The illustrative return value from our syscall
    test rax,rax                ; Perform 'test' to set flags accordingly 
    jns Exit                    ; If SF set, 'fall-through' to 'Error'

; Write error message to stdout:
Error:
    mov rax,4                  ; sys_write 
    mov rbx,1                  ; File descriptor 1, stdout
    mov rcx,ErrorMsg           ; Pass offset of message
    mov rdx,ERRORLEN           ; Length of error message
    int 80h                    ; Kernel call

; Exit program:
Exit:
    mov rax,1                  ; exit system call
    mov rbx,0                  ; return a code of zero
    int 80h                    ; make kernel call

In this scenario, we would have (wrongly) assumed an error had occurred, written an error message to stdout and exited the program. I appreciate this is a somewhat unlikely scenario. However, am I wrong in stating this as a possible bug? If so, why?

Alternatively, is the answer, simply, that there are no possible return values, given all of the system calls under Linux, that would ever return a value large enough to set the sign bit in a 64-bit number; or a 32-bit number for that matter?

How to implement error handling for system calls, under Linux, in such a way that a scenario such as laid about above, would be avoided.

What is the standard convention for error handling system calls in x86 asm, under Linux?

....................................................

NASM version 2.11.08 Architecture x86 | Ubuntu 16.04

1

1 Answers

4
votes

The legitimate return values from system calls are always either positive (signed) integers or addresses. When they are positive integers, the negative values can be used as error codes, so any negative value is an error.

So the only tricky case is when the return value is an address. It turns out that the addresses corresponding to integers in the range -4096..-1 are all in a kernel reserved page that will never be returned by the kernel -- so any bit pattern in that range will only ever be returned as an error code, and not as a valid address.

In addition, ALL addresses that correspond to negative integers in x86_64 are reserved for the kernel or invalid -- user addresses will always be in the range 0..247-1. So for x86_64 you need only check the sign bit (top bit) of %rax -- if it is set, there was an error.

test %rax, %rax
js   error

Fo 32-bit x86 code, this is not the case -- some valid addresses are negative numbbers. So in that case, you need to explicitly check for the error range, which is actually easiest to do with an unsigned comparison

cmpl  %eax, 0xfffff000   # unsigned 2^32 - 4096, aka signed -4096
ja    error              # -4095 .. -1 is an error, anything else is non-error