1
votes

A segfault gets thrown occasionally when I call deadline_timer::async_wait in my method SendMessageAgain. It can happen in one of two ways; I've included the backtraces below. It appears to be random, which makes me think somehow a race condition is involved. I have a class with an io_service object ioService, and multiple threads that each have timers hooked to ioService. Do I perhaps need to lock ioService before calling async_wait? I thought it handled that though.

Maybe it has something to do with whatever code gets interrupted when the timer ticks. What is the proper way to set up deadline timers when other code is executing as well?

The code I use in SendMessageAgain is

void Node::SendMessageAgain(unsigned long seqNum) {
  // figure out if and what to send (using object fields)
  if (should_send_again) {
    Send(...);
    timer->expires_from_now(INTERVAL);
    timer->async_wait(bind(&Node::SendMessageAgain, this, seqNum));
  }
}
#0  0x08060609 in boost::asio::detail::deadline_timer_service<boost::asio::time_traits<boost::posix_time::ptime> >::async_wait<boost::_bi::bind_t<void, boost::_mfi::mf1<void, Node, unsigned long>, boost::_bi::list2<boost::_bi::value<Node*>, boost::_bi::value<unsigned long> > > > (this=0x14, impl=..., handler=...)
    at /usr/include/boost/asio/detail/deadline_timer_service.hpp:170
#1  0x0805e2e7 in boost::asio::deadline_timer_service<boost::posix_time::ptime, boost::asio::time_traits<boost::posix_time::ptime> >::async_wait<boost::_bi::bind_t<void, boost::_mfi::mf1<void, Node, unsigned long>, boost::_bi::list2<boost::_bi::value<Node*>, boost::_bi::value<unsigned long> > > > (this=0x0, impl=..., 
    handler=...) at /usr/include/boost/asio/deadline_timer_service.hpp:135
#2  0x0805bcbb in boost::asio::basic_deadline_timer<boost::posix_time::ptime, boost::asio::time_traits<boost::posix_time::ptime>, boost::asio::deadline_timer_service<boost::posix_time::ptime, boost::asio::time_traits<boost::posix_time::ptime> > >::async_wait<boost::_bi::bind_t<void, boost::_mfi::mf1<void, Node, unsigned long>, boost::_bi::list2<boost::_bi::value<Node*>, boost::_bi::value<unsigned long> > > > (this=0x807fc50, handler=...)
    at /usr/include/boost/asio/basic_deadline_timer.hpp:435
#3  0x080555a2 in Node::SendMessageAgain (this=0xbfffefdc, seqNum=8)
    at node.cpp:147
#0  __pthread_mutex_lock (mutex=0x2f200c4) at pthread_mutex_lock.c:50
#1  0x08056c79 in boost::asio::detail::posix_mutex::lock (this=0x2f200c4)
    at /usr/include/boost/asio/detail/posix_mutex.hpp:52
#2  0x0805a036 in boost::asio::detail::scoped_lock<boost::asio::detail::posix_mutex>::scoped_lock (this=0xbfffeb04, m=...)
    at /usr/include/boost/asio/detail/scoped_lock.hpp:36
#3  0x08061dd0 in boost::asio::detail::epoll_reactor::schedule_timer<boost::asio::time_traits<boost::posix_time::ptime> > (this=0x2f200ac, queue=..., 
    time=..., timer=..., op=0x807fca0)
    at /usr/include/boost/asio/detail/impl/epoll_reactor.hpp:43
#4  0x0806063a in boost::asio::detail::deadline_timer_service<boost::asio::time_traits<boost::posix_time::ptime> >::async_wait<boost::_bi::bind_t<void, boost::_mfi::mf1<void, Node, unsigned long>, boost::_bi::list2<boost::_bi::value<Node*>, boost::_bi::value<unsigned long> > > > (this=0xb6b005fc, impl=..., 
    handler=...)
    at /usr/include/boost/asio/detail/deadline_timer_service.hpp:170
#5  0x0805e2fd in boost::asio::deadline_timer_service<boost::posix_time::ptime, boost::asio::time_traits<boost::posix_time::ptime> >::async_wait<boost::_bi::bind_t<void, boost::_mfi::mf1<void, Node, unsigned long>, boost::_bi::list2<boost::_bi::value<Node*>, boost::_bi::value<unsigned long> > > > (this=0xb6b005e8, 
    impl=..., handler=...)
    at /usr/include/boost/asio/deadline_timer_service.hpp:135
#6  0x0805bcd1 in boost::asio::basic_deadline_timer<boost::posix_time::ptime, bo---Type <return> to continue, or q <return> to quit---
ost::asio::time_traits<boost::posix_time::ptime>, boost::asio::deadline_timer_service<boost::posix_time::ptime, boost::asio::time_traits<boost::posix_time::ptime> > >::async_wait<boost::_bi::bind_t<void, boost::_mfi::mf1<void, Node, unsigned long>, boost::_bi::list2<boost::_bi::value<Node*>, boost::_bi::value<unsigned long> > > > (this=0xb6b005a0, handler=...)
    at /usr/include/boost/asio/basic_deadline_timer.hpp:435
#7  0x080555a7 in Node::SendMessageAgain (this=0xbfffefdc, seqNum=9)
    at node.cpp:148
2
Is it possible that your Node is going out of scope while the asynchronous task is still pending?Chad
I don't think so. In main I do Node n; n.Create(). Create spawns a couple of threads and then calls io_service::run(), which is non-reentrant in my case, since I have a periodic timer and I never cancel it. Could it be problematic that other threads are creating timers on the same io_service?Nick
this sounds like an object lifetime issue, post a sscceSam Miller
@Nick you can create deadline_timer objects from multiple threads using a single io_service. Have you tried running under valgrind?Sam Miller
Hmm, I think I may have caught it. If so, it was an object lifetime issue (not Node) brought about by a race condition.Nick

2 Answers

2
votes

All right, it's fixed. As Sam suggested, it was an object lifetime issue, though not with Node. I had a timer in an object owned by Node. There was a race condition in my code where I was resetting the timer outside of the critical section and its owner was being destroyed between end of the critical section and the resetting of the timer. I simply expanded the critical section and that fixed it.

I'm not sure why the segfault manifested itself so far down in async_wait though, since the callback (and associated this pointer) belonged to Node, which still existed.

0
votes

Bad:

timer->async_wait(bind(&Node::SendMessageAgain, this, seqNum, _1));

Good:

timer->async_wait(bind(&Node::SendMessageAgain, shared_from_this(), seqNum, _1));

Have Node extend enable_shared_from_this

class Node : public boost::enable_shared_from_this<Node>

This could fix the issue if it's caused by your Node being destroyed while a callback is still scheduled for it.