I am using Celery standalone (not within Django). I am planning to have one worker task type running on multiple physical machines. The task does the following
- Accept an XML document.
- Transform it.
- Make multiple database reads and writes.
I'm using PostgreSQL, but this would apply equally to other store types that use connections. In the past, I've used a database connection pool to avoid creating a new database connection on every request or avoid keeping the connection open too long. However, since each Celery worker runs in a separate process, I'm not sure how they would actually be able to share the pool. Am I missing something? I know that Celery allows you to persist a result returned from a Celery worker, but that is not what I'm trying to do here. Each task can do several different updates or inserts depending on the data processed.
What is the right way to access a database from within a Celery worker?
Is it possible to share a pool across multiple workers/tasks or is there some other way to do this?