4
votes

What should an operating system interrupt handler do for interrupts related to coding mistakes?

For example, I tried to divide by 0 for testing my interrupt and my interrupt handler got called. However, because the div instruction was not successfully executed, EIP is not updated to the next instruction after it and after returning from the interrupt handler with iret, it goes back to the erroneous div instruction again.

  mov ax, 3
  mov dl, 0
  div dl    ; go back here again and again

What is the correct way to handle this interrupt? A few ways I thought of:

  • Change dl to something else other than 0. However, I'm not sure if dl can keep if something happens, and interrupt routine is supposed to restore registers after exit, and I don't think silently correct an error by providing wrong computation is good.

  • Retrieve the next instruction after div. However, I haven't thought of any simple and reliable way to get the next instruction.

  • Modify the top of the stack that is currently containing the return address to the address of some other code. So, we do not go back to div instruction anymore.

2
Check for a divisor of 0 before division and flag it as an error before executing it? But if you really want to do this you could could decode the instruction (at the return address) in your Div by zero interrupt handler and then compute the length of the current instruction (the one that faulted) and then update the return address by adding the length of the instruction. This would have the effect of restarting at the next instruction (easier said than done). Your last idea is often used, and more reasonable (returning to a generic error handler).Michael Petch
@Michael Petch I am writing an interrupt handler at OS level, so I cannot assume a userspace program would do that. I think I will settle at redirecting flow of execution to some other code, since we cannot try to "correct" the computation or keep executing the next instruction with garbage value and make everything wrong. Probably that's why when trying to divide by 0 in C program, it gives floating point exception (compiled with gcc) and stop the program.Tu Do
If you are writing an OS then on DIV by zero I'd put an OS error handler on the stack as a return address and return to that to terminate the user program and inform them of the reason (Division by zero). If user space programs want to override that then your OS could supply a some type of system call for a user program to register an error handler. If a handler is registered by the user space program then that handler is placed on the stack before doing the IRET. The specifics would be dependent on how you have developed your OS.Michael Petch
If you are writing a POSIX OS then the correct thing to do is to pass a SIGFPE to the process (which will kill that process unless they have a signal handler installed).dave
@MichaelPetch Thanks for the answer. Could you make it an answer, so I can accept it?Tu Do

2 Answers

6
votes

You are right that none of those are particularly good things to do in the case of this particular interrupt. As mentioned in the comments, since you have the address of the instruction you could fetch whatever is at that address, decode the instruction and then advance the pointer to the next address .... but the code won't have been expecting that!

In POSIX operating systems this exceptional behaviour is covered by the SIGFPE signal. If you were writing an OS and wanted to follow POSIX then your interrupt handler should send that signal to the process. If the process has a handler for that signal then jump to that and allow the process to handle this (for example, that is how a try/catch block in a high level language could work .... now you know why exceptions are slow!). If there is no signal handler then the process should be killed (and re-enter your scheduler to work out what to do next ... and hope it was PID 1!).

Of course, this is your OS, and there's no reason you have to follow POSIX if you don't want to! If you have some other fancy way to handle an error in a user program then you can implement that instead.

4
votes

In general, for coding errors there are only 2 choices:

a) Terminate the process. This may or may not include doing other things (logging the error, creating a core dump, etc).

b) Allow the process (that couldn't even do normal execution right) to attempt its own recovery (which is harder to do and test than normal execution). An example of this is signals.