1
votes

I have a bunch of servers where files are being generated constantly. These files need to be sent to a central location. The files are never larger than 50MB. I am planning to use ZeroMQ to send these files (encapsulated in messages), so that file writing on the central location does not happen concurrently (for e.g. using scp to do the transfers would start many disk write processes on the destination).

I can see a few ways to do this with ZeroMQ:

  1. Use REQ sockets on the producers and a single REP socket on the consumer. This could work, but I think it would starve slower producers, as there is no fair queueing. Also, I am not sure if the REQ sockets would drop messages if the REP socket is not available.
  2. Use PUSH sockets on the producers and a PULL socket on the consumer. This has fair queuing on the consumer and the docs say that PUSH sockets never discard messages. However, is it fully reliable?

My reliability requirements are:

  1. Messages (in my case files) should not be lost. So I would like to build it in such a way that there is an acknowledgement to the producer for each message received at the consumer.
  2. Messages from a particular producer should be received in the same order as they were produced.
  3. Producers can come and go, and they should be resistant to the consumer being unavailable for some periods of time.

What sort of sockets are appropriate for this kind of application? Any pointers to what kind of zmq pattern I should be looking at would be great.

1

1 Answers

0
votes

REQ/REP approach seems to be the best for this task, since the number of messages is low and high reliability is required.

  1. Store the files on each of producers in a way that allows you to find out creation order (time in the filename or file index in db)
  2. Each of producers should select the oldest file, send it to the socket and wait ACK reply. File should be deleted (or marker as delivered) upon ACK.
  3. The consumer should read file content from the socket, flush it to disk and send ACK message afterwards.
  4. The producer should send the next file only after receiving ack from the previous one.

This might work, however I see one major problem here: several producers will flood consumer's network Interface, even if they don't touck the disk or spawn processes on consumer. This should be a problem in any design with producer-initiated file transfer. PUSH/PULL sockets will have the same problem.

Another point to note: ZeroMQ messages are buffered in memory until the whole message is received. So, 20 producers each sending 50MB file will require at peak 1GB RAM.

As an alternative, I would propose sending to the profucer only the names of files, and pulling files sequentially.