3
votes

I've been doing threaded networking for a game, but the server dies randomly, while i've been testing the networking so that I have several clients connecting and sending bunch of packets and disconnecting then connecting back again.

I am using c++ with SFML/Network and SFML/System threads. I have thread which listens for connections in the server once connection is established it creates two new threads for sending and receiving packets. The event handler and the send/receive threads share data with two std::queues. I've been trying to debug the crash with gdb, but i'm not that experienced with this so i'm looking for help.

Here is gdb console input when the crash happens.

OUT: 10 1 HELLO
IN: 10 0 LOLOLOL
OUT: 10 1 HELLO
IN: 10 0 LOLOLOL
OUT: 10 1 HELLO
Out thread killed by in thread!
In thread died!
New client connected!
[Thread 0x34992b70 (LWP 16167) exited]
[New Thread 0x3118bb70 (LWP 16186)]
terminate called without an active exception

Program received signal SIGABRT, Aborted.
[Switching to Thread 0x35193b70 (LWP 16166)]
0x00110416 in __kernel_vsyscall ()

Here is the backtrace:

(gdb) backtrace
#0  0x00110416 in __kernel_vsyscall ()
#1  0x46a0967f in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
#2  0x46a0afb5 in abort () at abort.c:92
#3  0x47b8af0d in __gnu_cxx::__verbose_terminate_handler () at ../../../../libstdc++-v3/libsupc++/vterminate.cc:95
#4  0x47b88c84 in __cxxabiv1::__terminate (handler=0x47b8adc0 <__gnu_cxx::__verbose_terminate_handler()>) at ../../../../libstdc++-v3/libsupc++/eh_terminate.cc:40
#5  0x47b88cc0 in std::terminate () at ../../../../libstdc++-v3/libsupc++/eh_terminate.cc:50
#6  0x47b8878f in __cxxabiv1::__gxx_personality_v0 (version=1, actions=10, exception_class=890844228, ue_header=0x35193dc0, context=0x35192ea0)
    at ../../../../libstdc++-v3/libsupc++/eh_personality.cc:669
#7  0x46bdbfbe in _Unwind_ForcedUnwind_Phase2 (exc=0x35193dc0, context=0x35192ea0) at ../../../gcc/unwind.inc:175
#8  0x46bdc3a9 in _Unwind_ForcedUnwind (exc=0x35193dc0, stop=0x46b76fc0 <unwind_stop>, stop_argument=0x35193444) at ../../../gcc/unwind.inc:207
#9  0x46b794e2 in _Unwind_ForcedUnwind (exc=0x35193dc0, stop=0x46b76fc0 <unwind_stop>, stop_argument=0x35193444) at ../nptl/sysdeps/pthread/unwind-forcedunwind.c:132
#10 0x46b77141 in __pthread_unwind (buf=<optimized out>) at unwind.c:130
#11 0x46b6f5bb in __do_cancel () at ../nptl/pthreadP.h:265
#12 sigcancel_handler (sig=<optimized out>, si=<optimized out>, ctx=<optimized out>) at nptl-init.c:202
#13 sigcancel_handler (sig=32, si=0x35192f7c, ctx=0x35192ffc) at nptl-init.c:155
#14 <signal handler called>
#15 0x08049930 in out (data=0xb761c798) at src/layer7.cpp:40
#16 0x0804b8d7 in sf::priv::ThreadFunctorWithArg<void (*)(networkdata*), networkdata*>::Run (this=0xb761c7c8) at /usr/local/include/SFML/System/Thread.inl:48
#17 0x00116442 in sf::Thread::Run() () from /home/toni/ProjectRepos/sfml/build/lib/libsfml-system.so.2
#18 0x001166df in sf::priv::ThreadImpl::EntryPoint(void*) () from /home/toni/ProjectRepos/sfml/build/lib/libsfml-system.so.2
#19 0x46b70c5e in start_thread (arg=0x35193b70) at pthread_create.c:305
#20 0x46ab4b4e in clone () at ../sysdeps/unix/sysv/linux/i386/clone.S:133

Here is the thread code from src/layer7.cpp

void out(networkdata * data) {
    bool running = true;
    while(running) {
        if(data->pipe_out->pipe_empty() == false) {
            sf::Packet packet = data->pipe_out->pop_message();
            if(data->socket->Send(packet) == sf::Socket::Disconnected) {
                data->thread_in->Terminate();
                std::cout << "In thread killed by out thread!" << std::endl;
                running = false;
            }
        }
    }
    std::cout << "Out thread died!" << std::endl;
}
  • Line 40 is the first if keyword after the while(running).
  • The data->pipe_out->pipe_empty() is call to the queue->empty()
  • The data->pipe_out->pop_message() is call which pops the front from the queue.
  • Then it sends the packet and checks if the connection is not disconnected
  • if socket is disconnected it terminates the "in" thread and stops the own thread.
2
Where are the locks around data? - Lightness Races in Orbit
Switch threads in gdb to get a different backtrace, maybe. - Lightness Races in Orbit
Actually i realized just now before checking your comments that i should probably lock data. :P - TMKCodes
Yeah, stupid me. It seems the bug was there. Seems to be working now. 5 Minutes been running already and before it crashed under a minute. - TMKCodes
You are getting a signal (look at the line that says "signal handler called"). I'm not sure which one, I don't trust the "sig=32" bit. This usually terminates your thread. Perhaps you should try use your own signal handlers. - n. 1.8e9-where's-my-share m.

2 Answers

3
votes

You need locks around data to protect against concurrent access to the same data structure from multiple threads.

0
votes

One possible reason for is an exception: exception should be caught withing thread. Also, looks like data->thread_in->Terminate() sends cancelation request, make sure that all established cancellation handlers are working correctly in that case.