I'm working on developing a simulation using massively distributed cellular automata. Cell simulations are distributed across nodes and coordinated using ZooKeeper. Persistent data is stored in Riak. The cellular automata themselves are written in Python.
It would be very convenient for my simulation if a cell could pass a low volume (between several and tens per second, say) of messages (probably key-value pairs) to its immediate neighbors (Manhattan neighborhood). However, for a simulation of millions of cells, the naïve approach ends up with millions of little mailboxes, one for each cell, and a slow trickle of messages into each box. This brings ZooKeeper or RabbitMQ to its knees! I've been recommended DDS, but it appears to be very Enterprise, and no Python bindings that I can find.
I'm new to distributed systems development--this is really just a hobby project to see how far I can get. I can't help but feel that I'm going about this the wrong way, turning to a monolithic message bus for each little cell's mailbox. It's easy for a cell to determine its neighbors and its place in the world, so it seems like the message passing ought to be susceptible to some kind of chunking. The design of this regional actor and how it would communicate with the individual cells escapes me, however. I see how the cells could pass messages to the chunk via a message bus, but how would the chunk pass messages back to the cell?
Am I heading anywhere near a real solution to this problem? What's the proper way for distributed nodes to pass low-volume messages to its neighbors?