I'm attempting to implement what can best be described as "an FTP interface to an HTTP API". Essentially, there is an existing REST API that can be used to manage a user's files for a site, and I'm building a mediator server that re-exposes this API as an FTP server. So you can login with, say, Filezilla and list your files, upload new ones, delete old ones, etc.
I'm attempting this with twisted.protocols.ftp
for the (FTP) server, and twisted.web.client
for the (HTTP) client.
The thing I'm running up against is, when a user tries to download a file, "streaming" that file from an HTTP response to my FTP response. Similar for uploading.
The most straightforward approach would be to download the entire file from the HTTP server, then turn around and send the contents to the user. The problem with this is that any given file could be many gigabytes large (think drive images, ISO files, etc). With this approach, though, the contents of the file would be held in memory between the time I download it from the API and the time I send it to the user - not good.
So my solution is to try to "stream" it - as I get chunks of data from the API's HTTP response, I just want to turn around and send those chunks along to the FTP user. Seems straightforward.
For my "custom FTP functionality", I'm using a subclass of ftp.FTPShell
. The reading method of this, openForReading
, returns a Deferred that fires with an implementation of IReadFile
.
Below is my (initial, simple) implementation for "streaming HTTP". I use the fetch
function to setup an HTTP request, and the callback I pass in gets called with each chunk I get from the response.
I thought I could use some sort of two-ended buffer object to transport the chunks between the HTTP and FTP, by using the buffer object as the file-like object required by ftp._FileReader
, but that's quickly proving not to work, as the consumer from the send
call almost immediately closes the buffer (because it's returning an empty string, because there's no data to read yet, etc). Thus, I'm "sending" empty files before I even start receiving the HTTP response chunks.
Am I close, but missing something? Am I on the wrong path altogether? Is what I want to do really impossible (I highly doubt that)?
from twisted.web import client
import urlparse
class HTTPStreamer(client.HTTPPageGetter):
def __init__(self):
self.callbacks = []
def addHandleResponsePartCallback(self, callback):
self.callbacks.append(callback)
def handleResponsePart(self, data):
for cb in self.callbacks:
cb(data)
client.HTTPPageGetter.handleResponsePart(self, data)
class HTTPStreamerFactory(client.HTTPClientFactory):
protocol = HTTPStreamer
def __init__(self, *args, **kwargs):
client.HTTPClientFactory.__init__(self, *args, **kwargs)
self.callbacks = []
def addChunkCallback(self, callback):
self.callbacks.append(callback)
def buildProtocol(self, addr):
p = client.HTTPClientFactory.buildProtocol(self, addr)
for cb in self.callbacks:
p.addHandleResponsePartCallback(cb)
return p
def fetch(url, callback):
parsed = urlparse.urlsplit(url)
f = HTTPStreamerFactory(parsed.path)
f.addChunkCallback(callback)
from twisted.internet import reactor
reactor.connectTCP(parsed.hostname, parsed.port or 80, f)
As a side note, this is only my second day with Twisted - I spent most of yesterday reading through Dave Peticolas' Twisted Introduction, which has been a great starting point, even if based on an older version of twisted.
That said, I may be doing things wrong.