2
votes

I'm trying to create a little web server that loads, using webkit, an URL to extract some data from the web page (eg: title, images sizes...).

I'm using PyQt4 to access from python to webkit. For each request, I'm creating a QThread that: - creates an QWebPage object, - run an event loop - when the loading of the webpage has finished (loadFinished signal), some code extracts data from the mainFrame of the QWebPage and kills the QThread

This works very well the first time, the web page is loaded, included all its resources (CSS, images). The second time I ask the server to load an url, the web page is loaded, but none of its resources (no css, no images). So when I try to retrieve image sizes, all size are set to 0,0.

Here is some code snipset:

# The QThread responsible of loading the WebPage
class WebKitThread(QThread):
    def __init__(self, url):
        QThread.__init__(self)
        self.url = url
        self.start()
    def run(self):
        self.webkitParser = WebKitParser(self.url)
        self.exec_()

class WebKitParser(QWebPage):
    def __init__(self, url, parent=None):
        QWebPage.__init__(self, parent )
        self.loadFinished.connect(self._loadFinished)
        self.mainFrame().load(QUrl(url))

    def _loadFinished(self, result):
        self.computePageProperties()
        QThread.currentThread().exit()

    def computePageProperties(self):
        # Some custom code that reads title, image size...
        self.computedTitle=XXXXXXXX

The calling code (that respond to the HTTP request) is executing:

t = WebKitThread(url)
t.wait()
# do some stuff with properties of WebKitParser
print t.webkitParser.computedTitle
1

1 Answers

2
votes

I've managed to fix the issue: creating the QWebPage in the GUI thread (the thread of QApplication event loop) fixes the issue.

It seems the second time a QWebPage is used, it tries to access to the browser cache (even if it has been disabled by configuration). But if the first QWebPage was not created in the main GUI thread, the cache is somewhat misconfigured and not usable.

To create the QWebPage in the main GUI thread I'm using a custom QEvent (QEvent of type User) that triggers QWebPage initialization and result fetching.