0
votes

I have an application running on Solaris 8 (SunOS 5.8 Generic_108528-27 sun4u sparc SUNW,Sun-Fire-880) and it's running good for several days until recently it crashed. There was a watchdog module which restarted the application when it crashed. However, it run and crashed again and again. After examined the core dumps, I found that it crashed on the system function calls such as poll, write and send. I examined the contents of the variables passed to the functions and they looked good. I have no idea how to troubleshoot this. Anyone can help to give some guidance on where proceed? Thanks in advance.

Below shows one of the core dump examples on poll:

bash$ gdb applx applx.core
GDB is free software and you are welcome to distribute copies of it
under certain conditions; type "show copying" to see the conditions.
There is absolutely no warranty for GDB; type "show warranty" for details.
GDB 4.16 (sparc-sun-solaris2.5), Copyright 1996 Free Software Foundation, Inc...

warning: exec file is newer than core file.
Core was generated by `applx -h'.
Program terminated with signal 11, Segmentation fault.
Reading symbols from /usr/lib/libsocket.so.1...done.
Reading symbols from /usr/lib/libnsl.so.1...done.
Reading symbols from /usr/lib/libgen.so.1...done.
Reading symbols from /usr/lib/libc.so.1...done.
Reading symbols from /usr/lib/libdl.so.1...done.
Reading symbols from /usr/lib/libmp.so.2...done.
Reading symbols from /usr/platform/SUNW,Sun-Fire-880/lib/libc_psr.so.1...done.
#0 0xff219ec4 in _libc_poll ()
(gdb) bt
#0 0xff219ec4 in _libc_poll ()
#1 0xff1cccac in _select ()
#2 0x1cf08 in loop () at /home/ian123/applx/src/task.c:1450
#3 0x1e0d4 in state_start (local=0) at /home/ian123/applx/src/state.c:1047
#4 0x1a0f4 in main (argc=537600, argv=0x83400)
at /home/ian123/applx/src/main.c:578
(gdb) up
#1 0xff1cccac in _select ()
(gdb) up
#2 0x1cf08 in loop () at /home/ian123/applx/src/task.c:1450
1450 r = select(maxfd, rfdsp, wfdsp, efdsp, tvp);
(gdb) p maxfd
$1 = 23
(gdb) p rfdsp
$2 = (fd_set *) 0xb8020
(gdb) p wfdsp
$3 = (fd_set *) 0x0
(gdb) p efdsp
$4 = (fd_set *) 0x0
(gdb) p tvp
$5 = (struct timeval *) 0xb81a0
(gdb) p *rfdsp
$6 = {fds_bits = {7610424, 0 }}
(gdb) p *tvp
$7 = {tv_sec = 0, tv_usec = 380002}

2
Just to eliminate possibilities, is the program multithreaded (in which case you might be looking at the stack for the wron thread ) ?nos
Hi nos, it's single threaded. Thank youuser502865
@user502865 in that case I'd start searching for buffer overflows/heap corruption somewhere.nos

2 Answers

2
votes

When I'm investigating a segfault and I have no idea where it's happening, I use the following gdb command:

x/1i <program_counter>

(Substitute <program counter> for your architecture's ...(drum roll)... program counter, e.g: $eip on x86. I guess it's $pc or similar on SPARC).

That shows the faulting instruction. From there I examine registers that contain memory addresses.

0
votes

If GDB will show you the source code where the segmentation fault occurred then this MAY quickly lead to an understanding of the problem.