0
votes

I am trying to understand the working of celery and AMQP here.

My scenario

I install celery in my machine

pip install celery

I make tasks using

from celery import Celery

app = Celery('tasks', backend='amqp', broker='amqp://')

@app.task
def print_hello():
    print 'hello there'

As far as I understood, celery converts this task to message and send to brokers(redis or rabbitmq) via AMQP protocol. Then these messages are queued and delivered to worker nodes to process the message.

My questions are,

  1. Suppose I created task in a Java environment and if the message is sent to a external worker node, does it mean the worker node server must have Java installed in it to execute the task ?
  2. If the message is picked by external worker node, how does worker node and broker find each other ? In the above code I only have the broker address to store tasks.

Also Why are we storing the tasks in a broker ? Why couldn't we implement exchange algorithm in celery and send the message direct to workers ?

What is the difference between SOAP and AMQP ?

2

2 Answers

1
votes

The workers need not only Python, but all the code for the tasks you want to run on them.

But you don't address the nodes specifically, that is precisely why there is a broker. You put your tasks on the queue, and the workers pick them up.

I have no idea why you've mentioned SOAP in this context. It has nothing whatsoever to do with anything.

0
votes

The specific answers to your questions are:

  1. "if the message is sent to a external worker node" is slightly misleading. A message is not sent to a worker node per se. It is sent to the Broker (identified by a URL) and specifically an Exchange on that broker with a Routing Key which sees it landing in a Queue. Workers are all configured with the same Broker URL and read this Queue, and it's very much a case of [first-in-best-dressed][1], the first Worker to consume the message (to read a message in an AMQP it is removed from the Queue in one atomic operation). The [messages][2] are language independent. The Workers however are written in Python and the task definition must be in Python, though the Python task definition can of course call out to any other library by whatever means to execute the task. But in a sense yes, whatever run time libraries your task needs in order to run it needs to have on the same machine as the Worker, and they must have a Python wrapper around them so the Worker can load them.

  2. "If the message is picked by external worker node, how does worker node and broker find each other?" - This question is misleading. They don't find each other. The Worker is configured with the exact same Broker URL as the Client is. It has know the URL. The way Celery typically solves this in Python is that the code snippet you shared is loaded by both the Client, and the Worker. This is in fact one of the beauties of Celery. That you write you tasks in Python and you load the definitions in the Worker unaltered. They thus use the same Broker, and have the same Task defined. The @app.task actually creates a Task class instance which has two very important methods: apply_async() which is what creates and sends the message requesting the task, and run() which runs the decorated function. The former is called int he Client. The latter by the Worker (to actually run the task).

  3. "Why are we storing the tasks in a broker?" -Tasks are not stored in a broker. The task is defined in a python file like your code snippet. As described in 2. The same definition is read by both Client and Worker. A messages is sent from Client to Worker asking it to run the task.

  4. "Why couldn't we implement exchange algorithm in celery and send the message direct to workers?" - I'll have to take a guess here, but I would ask, Why reinvent the wheel? There is a standard defined, AMQP (the Advanced Message Queueing Protocol), and there are a number of implementations of that standard. Why write yet another one? Celery is FOSS, and like so much FOSS I imagine the people who started writing it wanted to focus on task management not message management and chose to lean on AMQP for message management. A fair choice. But for what it's worth Celery does implement quite a lot in Kombu, to provide a Python API to AMQP.

SOAP (abbreviation for Simple Object Access Protocol) is a messaging protocol specification for exchanging structured information in the implementation of web services in computer networks.

AMQP (abbreviation for Advanced Message Queuing Protocol) is an open standard application layer protocol for message-oriented middleware. The defining features of AMQP are message orientation, queuing, routing (including point-to-point and publish-and-subscribe), reliability and security.

SOAP is typically much higher level int the protocol stack. Described here:

https://www.amqp.org/product/different