4
votes

I'm attempting to implement what can best be described as "an FTP interface to an HTTP API". Essentially, there is an existing REST API that can be used to manage a user's files for a site, and I'm building a mediator server that re-exposes this API as an FTP server. So you can login with, say, Filezilla and list your files, upload new ones, delete old ones, etc.

I'm attempting this with twisted.protocols.ftp for the (FTP) server, and twisted.web.client for the (HTTP) client.

The thing I'm running up against is, when a user tries to download a file, "streaming" that file from an HTTP response to my FTP response. Similar for uploading.

The most straightforward approach would be to download the entire file from the HTTP server, then turn around and send the contents to the user. The problem with this is that any given file could be many gigabytes large (think drive images, ISO files, etc). With this approach, though, the contents of the file would be held in memory between the time I download it from the API and the time I send it to the user - not good.

So my solution is to try to "stream" it - as I get chunks of data from the API's HTTP response, I just want to turn around and send those chunks along to the FTP user. Seems straightforward.

For my "custom FTP functionality", I'm using a subclass of ftp.FTPShell. The reading method of this, openForReading, returns a Deferred that fires with an implementation of IReadFile.

Below is my (initial, simple) implementation for "streaming HTTP". I use the fetch function to setup an HTTP request, and the callback I pass in gets called with each chunk I get from the response.

I thought I could use some sort of two-ended buffer object to transport the chunks between the HTTP and FTP, by using the buffer object as the file-like object required by ftp._FileReader, but that's quickly proving not to work, as the consumer from the send call almost immediately closes the buffer (because it's returning an empty string, because there's no data to read yet, etc). Thus, I'm "sending" empty files before I even start receiving the HTTP response chunks.

Am I close, but missing something? Am I on the wrong path altogether? Is what I want to do really impossible (I highly doubt that)?

from twisted.web import client
import urlparse

class HTTPStreamer(client.HTTPPageGetter):
    def __init__(self):
        self.callbacks = []

    def addHandleResponsePartCallback(self, callback):
        self.callbacks.append(callback)

    def handleResponsePart(self, data):
        for cb in self.callbacks:
            cb(data)
        client.HTTPPageGetter.handleResponsePart(self, data)

class HTTPStreamerFactory(client.HTTPClientFactory):
    protocol = HTTPStreamer

    def __init__(self, *args, **kwargs):
        client.HTTPClientFactory.__init__(self, *args, **kwargs)
        self.callbacks = []

    def addChunkCallback(self, callback):
        self.callbacks.append(callback)

    def buildProtocol(self, addr):
        p = client.HTTPClientFactory.buildProtocol(self, addr)
        for cb in self.callbacks:
            p.addHandleResponsePartCallback(cb)
        return p

def fetch(url, callback):

    parsed = urlparse.urlsplit(url)

    f = HTTPStreamerFactory(parsed.path)
    f.addChunkCallback(callback)

    from twisted.internet import reactor
    reactor.connectTCP(parsed.hostname, parsed.port or 80, f)

As a side note, this is only my second day with Twisted - I spent most of yesterday reading through Dave Peticolas' Twisted Introduction, which has been a great starting point, even if based on an older version of twisted.

That said, I may be doing things wrong.

1

1 Answers

2
votes

I thought I could use some sort of two-ended buffer object to transport the chunks between the HTTP and FTP, by using the buffer object as the file-like object required by ftp._FileReader, but that's quickly proving not to work, as the consumer from the send call almost immediately closes the buffer (because it's returning an empty string, because there's no data to read yet, etc). Thus, I'm "sending" empty files before I even start receiving the HTTP response chunks.

Instead of using ftp._FileReader, you want something that will do a write whenever a chunk arrives from your HTTPStreamer to a callback it supplies. You never need/want to do a read from a buffer on the HTTP, because there's no reason to even have such a buffer. As soon as HTTP bytes arrive, write them to the consumer. Something like...

class FTPStreamer(object):
    implements(IReadFile)

    def __init__(self, url):
        self.url = url

    def send(self, consumer):
        fetch(url, consumer.write)
        # You also need a Deferred to return here, so the 
        # FTP implementation knows when you're done.
        return someDeferred

You may also want to use Twisted's producer/consumer interface to allow the transfer to be throttled, as may be necessary if your connection to the HTTP server is faster than your user's FTP connection to you.