3
votes

I am debugging a segmentation fault. Here is the code snippet which faults at ff_printf call.

for (p = &v[QUEUE], i = 0; i < p->used; i++) {
        queue_t *q = p->data[i];
        ff_printf(F_DB, "  %02u %s\n",
            p->cp, q->tq->queue_name);
    }

The seg fault is at line ff_printf. When I debug through gdb, I can resolve p->cp and q->tq->queue_name. F_DB also gets resolved as it is an enum. Hence, it did not seg fault because of invalid deference.

After disassembly of the code, I get the below assembly of the above code snippet for the line ff_printf.

   0x0000000000449b88 <+360>:   mov    -0x14(%r13),%rax
   0x0000000000449b8c <+364>:   movzwl %r10w,%edx

   0x0000000000449b90 <+368>:   movzwl (%rbx),%r9d
   0x0000000000449b94 <+372>:   mov    $0x56a4d9,%r8d
   0x0000000000449b9a <+378>:   mov    $0x5,%ecx
   0x0000000000449b9f <+383>:   mov    $0x5bb,%esi
   0x0000000000449ba4 <+388>:   mov    $0x56a27b,%edi
   0x0000000000449ba9 <+393>:   mov    (%rax,%rdx,8),%rax
   0x0000000000449bad <+397>:   mov    $0x56aec0,%edx
=> 0x0000000000449bb2 <+402>:   mov    0x88(%rax),%rax
   0x0000000000449bb9 <+409>:   mov    %r10d,-0x48(%rbp)
   0x0000000000449bbd <+413>:   mov    %rax,(%rsp)
   0x0000000000449bc1 <+417>:   xor    %eax,%eax
   0x0000000000449bc3 <+419>:   callq  0x4423c0 <ff_printf>

Now, I did debug the registers and verified with the code snippet. I was able to get F_DB, p->cp, q->tq->queue_name through the assembly debugging (viz through registers). I observe that the value of %rax is 0x0. I observe that the seg fault happens before the call the ff_printf library.

I have two questions:

1: How do I map this

"    => 0x0000000000449bb2 <+402>:  mov    0x88(%rax),%rax" 

to the code snippet?

I observe that the %rax is populated through

0x0000000000449b88 <+360>:  mov    -0x14(%r13),%rax

which I think is mov ( address of $r13 - 0x14) to %rax.

and

0x0000000000449ba9 <+393>:  mov    (%rax,%rdx,8),%rax

which I think it is mov ( address $rax+ address $rdx+ 8) to %rax. Am I right ?

2: I am not sure if there was any stack corruption. This seg fault is very very rare, I cannot reproduce it. How to backtrace it further from here ?

1
In gdb, run disassemble /m to see the C source intermingled with the disassembled instructions. (This will only work if the program was compiled with the -g option.) Can you print the value of p->data[i] ? If it's zero, that would explain the seg fault.Mark Plotnick
I ran disas /m through gdb. In fact, the aforementioned assembly is for the line ff_printf in the C source. I am able to print the values of p->cp, q->tq->queue_name. Hence, there is no problem with p->data[i]. I did check these values from registers as the value were optimized out from gdb. I am confused that it is printing the value of q->tq->queue_name from gdb. But in register %rax, the value is null which explains seg fault. But I don't understand why it is null and what operation it is performing with the mov instruction. I guess its a race condition.Kailash Akilesh
Yes, it's possible that the executable's debug info has partially incorrect info about the location of q, and q is actually being kept in %rax. That's why I was curious what the value of p->data[i] is. When I see a mov (reg,reg,8) instruction, it often corresponds to a C array access.Mark Plotnick

1 Answers

0
votes

p = &v[QUEUE]

is wrong, if QUEUE is the size of v, since its index ranges from zero to QUEUE-1.

So use

p = &v[QUEUE-1]

Or, in case you wanted to start from the beginning of v, use

p = v