1
votes

I'm trying (PY)ZMQ for the first time, and wonder if it's possible to send a complete FILE (binary) using PUB/SUB? I need to send database updates to many subscribers. I see examples of short messages but not files. Is it possible?

publisher:

import zmq
import time
import os
import sys

while True:

print 'loop'
msg = 'C:\TEMP\personnel.db'

# Prepare context & publisher
context = zmq.Context()
publisher = context.socket(zmq.PUB)
publisher.bind("tcp://*:2002")
time.sleep(1)

curFile = 'C:/TEMP/personnel.db'
size = os.stat(curFile).st_size
print 'File size:',size

target = open(curFile, 'rb')
file = target.read(size)
if file:
    publisher.send(file)

publisher.close()
context.term()
target.close()
time.sleep(10)

subscriber:

'''always listening'''

import zmq
import os
import time
import sys

while True:

path = 'C:/TEST'
filename = 'personnel.db'
destfile = path + '/' + filename

if os.path.isfile(destfile):
    os.remove(destfile)
    time.sleep(2)

context = zmq.Context()
subscriber = context.socket(zmq.SUB)
subscriber.connect("tcp://127.0.0.1:2002")
subscriber.setsockopt(zmq.SUBSCRIBE,'')

msg = subscriber.recv(313344)
if msg:
    f = open(destfile, 'wb')
    print 'open'
    f.write(msg)
    print 'close\n'
    f.close()

time.sleep(5)
1
Have you tried this code? What's wrong with it? - Ahmet Kakıcı
fix your indentation - fferri

1 Answers

1
votes

You shall be able to accomplish to distribute files to many subscribers using zmq and PUB/SUB pattern.

Your code is almost there, or in other words, it might work in most situations, could be improved a bit.

Things to be aware of

Messages are living in memory

The message must fit into memory when getting published (living in PUB socket) and stays there until last currently subscribed consumer does not read it out or disconnects.

The message must also fit into memory when being received. But with reasonable large files (like your 313 kB) it shall work unless you are really short with RAM.

Slow consumer issue

In case you have multiple consumers, and one of them is reading much slower then the others, it will start slowing down all of them. Zmq is explaining this problem and also proposes some methods how to avoid it (e.g. suicide of slow subscriber).

However, in most situations, you will not encounter this problem.

Start your consumer first not to miss a message

zmq messaging is extremely fast. There is no problem, if you start your consumer sooner, then the publisher, zmq makes this scenario easy and consumer will connect automatically.

However, your publisher shall allow consumers to connect before it start publishing, your code does 1 second sleep before sending the message, this shall be sufficient.

Comments to your code

  • do you really have to sleep after os.remove? Probably not
  • subscriber.recv - there is no need to know message size in advance, zmq packet is aware of file size, so if you call it without number of bytes to receive, you will get it properly.

Send large files in chunks

zmq provides a feature called multipart messages, but according to doc, it has to fit completely (all message parts) in memory, before being sent out, so this is not the trick to use.

On the other hand, you can create "application level multipart protocol" in such a way, that you decide sending messages with structure like (hasNextPart, chunkData). This way you would be sending in well controlled sized messages and only the last one would tell "hasNextPart" == False.

Consumer would then read and write to disk all the parts until last message, claiming that there is no further part arrives.