2
votes

I have an application written in C++ & Qt that does a lot of network requests. The basic outline of my code is below:

{
    QNetworkReply* reply = networkAccessManager().get( QNetworkRequest( url ) );
    assert( reply );

    connect( reply, &QNetworkReply::finished, [=]
    {
        // do action based on the contents of the reply

        assert( reply->isFinished() );
        reply->deleteLater();
    });
}

The code keeps multiple requests in flight at the same time. Both of the asserts have never fired.

Randomly (about every 200000 requests), the deferred delete of this reply fails with something that appears to be an double free. This happens both in Qt 5.0.2 and Qt 5.2.x. I've ran valgrind with the following result:

==18792== Invalid read of size 8
==18792==    at 0x53AAC7A: QObject::~QObject() (in /usr/lib/x86_64-linux-gnu/libQt5Core.so.5.0.2)
==18792==    by 0x4EB60A8: ??? (in /usr/lib/x86_64-linux-gnu/libQt5Network.so.5.0.2)
==18792==    by 0x53A4357: QObject::event(QEvent*) (in /usr/lib/x86_64-linux-gnu/libQt5Core.so.5.0.2)
==18792==    by 0x537EBBC: QCoreApplication::notify(QObject*, QEvent*) (in /usr/lib/x86_64-linux-gnu/libQt5Core.so.5.0.2)
==18792==    by 0x537E8BD: QCoreApplication::notifyInternal(QObject*, QEvent*) (in /usr/lib/x86_64-linux-gnu/libQt5Core.so.5.0.2)
==18792==    by 0x5380AC5: QCoreApplicationPrivate::sendPostedEvents(QObject*, int, QThreadData*) (in /usr/lib/x86_64-linux-gnu/libQt5Core.so.5.0.2)
==18792==    by 0x53C38D4: QEventDispatcherUNIX::processEvents(QFlags<QEventLoop::ProcessEventsFlag>) (in /usr/lib/x86_64-linux-gnu/libQt5Core.so.5.0.2)
==18792==    by 0x537D88A: QEventLoop::exec(QFlags<QEventLoop::ProcessEventsFlag>) (in /usr/lib/x86_64-linux-gnu/libQt5Core.so.5.0.2)
==18792==    by 0x51F422A: QThread::exec() (in /usr/lib/x86_64-linux-gnu/libQt5Core.so.5.0.2)
==18792==    by 0x51F8A4A: ??? (in /usr/lib/x86_64-linux-gnu/libQt5Core.so.5.0.2)
==18792==    by 0x5812B4F: start_thread (pthread_create.c:304)
==18792==    by 0x62A1A7C: clone (clone.S:112)
==18792==  Address 0xb9fd670 is 0 bytes inside a block of size 16 free'd
==18792==    at 0x4C279DC: operator delete(void*) (vg_replace_malloc.c:457)
==18792==    by 0x53A4357: QObject::event(QEvent*) (in /usr/lib/x86_64-linux-gnu/libQt5Core.so.5.0.2)
==18792==    by 0x537EBBC: QCoreApplication::notify(QObject*, QEvent*) (in /usr/lib/x86_64-linux-gnu/libQt5Core.so.5.0.2)
==18792==    by 0x537E8BD: QCoreApplication::notifyInternal(QObject*, QEvent*) (in /usr/lib/x86_64-linux-gnu/libQt5Core.so.5.0.2)
==18792==    by 0x5380AC5: QCoreApplicationPrivate::sendPostedEvents(QObject*, int, QThreadData*) (in /usr/lib/x86_64-linux-gnu/libQt5Core.so.5.0.2)
==18792==    by 0x53C38D4: QEventDispatcherUNIX::processEvents(QFlags<QEventLoop::ProcessEventsFlag>) (in /usr/lib/x86_64-linux-gnu/libQt5Core.so.5.0.2)
==18792==    by 0x537D88A: QEventLoop::exec(QFlags<QEventLoop::ProcessEventsFlag>) (in /usr/lib/x86_64-linux-gnu/libQt5Core.so.5.0.2)
==18792==    by 0x538115F: QCoreApplication::exec() (in /usr/lib/x86_64-linux-gnu/libQt5Core.so.5.0.2)
==18792==    by 0x4093C4: main (main.cpp:38)

I think that the following things are certainly true:

  • The main thread deletes an resource.
  • The network thread then tries to delete the same resource. Which then results in an double delete. Which fails in some way

I think, but cannot verify, that the following things are true:

  • this resource is an QNetworkReply. It could be some Qt internal resource that i do not know about. Apart from this QNetworkReply, my codebase does not contain any QObjects that are frequently created or destructed.

I'm having difficulties in my attempts to solve this error. From the inspection of the stack trace, it appears the reply->deleteLater() signal is somehow delivered to both the network thread and the main thread. But i don't see how that could be the case. The signal and slot programming style makes it very hard to see where exactly something goes wrong.

How would i approach debugging this error?


An answer mentions a possible origin in synchronization. In my codebase only 1 class is permitted to be called from a different thread. The functions from this class fall in 2 categories:

  1. query some internal state.
  2. emit an signal.

The second category is implemented as:

class Foo {
  Q_OBJECT
public:
  void Foo() {
    connect( foo, &Foo::doSomethingSignal, this, &Foo:doSomethingInternal, Qt::QueuedConnection );
  }

  // this functions gets called from various threads
  void doSomething() {
    emit( doSomethingSignal() );
  }

private slots:
  // this function happens synchronized in the main thread
  void doSomethingInternal() {
    ...
  }

signals:
  void doSomethingSignal();
}

According to this stackoverflow question: emit Qt signal from non Qt Thread or ouside Qt main event loop with at 4.5 this is safe. The caller is not an QObject.

2
Say if you do find the error, the issue now is if the fix easy or hard. For issues like this, I would think that changing the code to use a std::shared_ptr may be what's needed. stackoverflow.com/questions/9127816/… You just need to know what the shared resource is, create a shared_ptr<shared_resource> and use that. - PaulMcKenzie
I don't think that you have knowledge about how Qt's resource system works? The QNetworkReply is owned by the network thread, and should be deleted using the deleteLater() slot. The network reply is then deleted the next time the network thread returns to the event loop. There is no resource leak, since QNetworkReply::finished is always called even if the reply timed out. - Stefan
I assume you've tried commenting out the reply->deleteLater() line and verified that the memory corruption no longer occurs? At least until you run out of memory anyway ... - njahnke
Why don't you post the whole code (main and network threads)? It is hard to tell more without seeing more code. - lpapp
@Laszlo The network thread is a concept that exists completely inside the Qt library. - Stefan

2 Answers

1
votes

Answering my own question:

I've made a test case with fairly minimum functionality. After some testing, it appears that the bug exists in my test case. Which is why i have made a bug report: https://bugreports.qt-project.org/browse/QTBUG-38309

The testcase can be viewed here: https://bitbucket.org/sdessens/qnetworkreply-access-violation-testcase/overview

There exists a workaround that involves waiting a few hundred milliseconds before deleting the reply, this works fine in the test case, but not in my application for some reason (after a few minutes, the event loop appears to stop working). The impact of random crashes is not earth scattering, so for now i am sticking a a while loop in bash to keep my application running until the Qt devs resolve this issue.

0
votes

From your Valgrind error report, it looks like program us trying to read the memory after some other thread has freed it. It does not looks like double free scenario rather than use after free.

==18792== Invalid read of size 8

==18792== Address 0xb9fd670 is 0 bytes inside a block of size 16 free'd

You may check out my earlier post on Valgrind and how GDB/Valgrind can be used together to perform the live debugging at the time of first error reported by your program.

This problem seems to due to synchronization problem between threads and some time one thread is freeing it which other thread is not aware of. Memory related problem in multi-threaded environments are very difficult to understand and resolve. You may want to think of using the suggestion provided in the above comments(smart pointer in C++/RAII based classes).