I'm trying to create a little web server that loads, using webkit, an URL to extract some data from the web page (eg: title, images sizes...).
I'm using PyQt4 to access from python to webkit. For each request, I'm creating a QThread that: - creates an QWebPage object, - run an event loop - when the loading of the webpage has finished (loadFinished signal), some code extracts data from the mainFrame of the QWebPage and kills the QThread
This works very well the first time, the web page is loaded, included all its resources (CSS, images). The second time I ask the server to load an url, the web page is loaded, but none of its resources (no css, no images). So when I try to retrieve image sizes, all size are set to 0,0.
Here is some code snipset:
# The QThread responsible of loading the WebPage class WebKitThread(QThread): def __init__(self, url): QThread.__init__(self) self.url = url self.start() def run(self): self.webkitParser = WebKitParser(self.url) self.exec_() class WebKitParser(QWebPage): def __init__(self, url, parent=None): QWebPage.__init__(self, parent ) self.loadFinished.connect(self._loadFinished) self.mainFrame().load(QUrl(url)) def _loadFinished(self, result): self.computePageProperties() QThread.currentThread().exit() def computePageProperties(self): # Some custom code that reads title, image size... self.computedTitle=XXXXXXXX
The calling code (that respond to the HTTP request) is executing:
t = WebKitThread(url) t.wait() # do some stuff with properties of WebKitParser print t.webkitParser.computedTitle