2
votes

I have some service which uses Python logging module to log debug logs.

my_service.py:

import logging

logger = logging.getLogger(__name__)

class SomeService:
    def synchronize(self):
        logger.debug('synchronizing stuff')
        external_library.call('do it')
        logger.debug('found x results')

Then, I use this service from celery task

tasks.py:

@shared_task
def synchronize_stuff():
    stuff = some_service.synchronize()

Worker then outputs log like this:

worker_1     | [2019-01-22 11:39:19,232: DEBUG/MainProcess] Task accepted: my_task_name[48d706d7-0d92-43aa-aa9d-d5db8d660af8] pid:12
worker_1     | [2019-01-22 11:39:19,237: DEBUG/ForkPoolWorker-1] Starting new HTTPS connection (1): example.com:443
worker_1     | [2019-01-22 11:39:19,839: DEBUG/ForkPoolWorker-1] https://example.com:443 "GET /api/stuff HTTP/1.1" 200 None
worker_1     | [2019-01-22 11:39:19,860: DEBUG/ForkPoolWorker-1] Processing 35
worker_1     | [2019-01-22 11:39:19,862: DEBUG/ForkPoolWorker-1] Item 35 already closed, ignoring.
worker_1     | [2019-01-22 11:39:19,863: DEBUG/ForkPoolWorker-1] Processing 36
worker_1     | [2019-01-22 11:39:19,865: DEBUG/ForkPoolWorker-1] Item 36 already closed, ignoring.
worker_1     | [2019-01-22 11:39:19,865: DEBUG/ForkPoolWorker-1] Processing 49
worker_1     | [2019-01-22 11:39:20,380: DEBUG/ForkPoolWorker-1] https://example.com:443 "GET /api/detail/49 HTTP/1.1" 200 None
worker_1     | [2019-01-22 11:39:20,429: DEBUG/ForkPoolWorker-1] Processing 50
worker_1     | [2019-01-22 11:39:20,680: DEBUG/ForkPoolWorker-1] https://example.com:443 "GET /api/detail/50 HTTP/1.1" 200 None
worker_1     | [2019-01-22 11:39:20,693: DEBUG/ForkPoolWorker-1] Processing 51
worker_1     | [2019-01-22 11:39:21,138: DEBUG/ForkPoolWorker-1] https://example.com:443 "GET /api/detail/51 HTTP/1.1" 200 None
worker_1     | [2019-01-22 11:39:21,197: INFO/ForkPoolWorker-1] Task my_task_name[48d706d7-0d92-43aa-aa9d-d5db8d660af8] succeeded in 1.9656380449960125s: None

Which is good enough for debugging, but I'd like to include the task name and uuid in these logs. This can be achieved by using celery task logger like this:

my_service.py:

from celery.utils.log import get_task_logger
logger = get_task_logger(__name__)

class SomeService:
    def synchronize(self):
        logger.debug('synchronizing stuff')
        external_library.call('do it')
        logger.debug('found x results')

Which does exactly what I want in terms of logging:

worker_1     | [2019-01-22 11:39:19,232: DEBUG/MainProcess] Task accepted: my_task_name[48d706d7-0d92-43aa-aa9d-d5db8d660af8] pid:12
worker_1     | [2019-01-22 11:39:19,237: DEBUG/ForkPoolWorker-1] Starting new HTTPS connection (1): example.com:443
worker_1     | [2019-01-22 11:39:19,839: DEBUG/ForkPoolWorker-1] https://example.com:443 "GET /api/stuff HTTP/1.1" 200 None
worker_1     | [2019-01-22 11:39:19,860: DEBUG/ForkPoolWorker-1] my_task_name[48d706d7-0d92-43aa-aa9d-d5db8d660af8]: Processing 35
worker_1     | [2019-01-22 11:39:19,862: DEBUG/ForkPoolWorker-1] my_task_name[48d706d7-0d92-43aa-aa9d-d5db8d660af8]: Item 35 already closed, ignoring.
worker_1     | [2019-01-22 11:39:19,863: DEBUG/ForkPoolWorker-1] my_task_name[48d706d7-0d92-43aa-aa9d-d5db8d660af8]: Processing 36
worker_1     | [2019-01-22 11:39:19,865: DEBUG/ForkPoolWorker-1] my_task_name[48d706d7-0d92-43aa-aa9d-d5db8d660af8]: Item 36 already closed, ignoring.
worker_1     | [2019-01-22 11:39:19,865: DEBUG/ForkPoolWorker-1] my_task_name[48d706d7-0d92-43aa-aa9d-d5db8d660af8]: Processing 49
worker_1     | [2019-01-22 11:39:20,380: DEBUG/ForkPoolWorker-1] https://example.com:443 "GET /api/detail/49 HTTP/1.1" 200 None
worker_1     | [2019-01-22 11:39:20,429: DEBUG/ForkPoolWorker-1] my_task_name[48d706d7-0d92-43aa-aa9d-d5db8d660af8]: Processing 50
worker_1     | [2019-01-22 11:39:20,680: DEBUG/ForkPoolWorker-1] https://example.com:443 "GET /api/detail/50 HTTP/1.1" 200 None
worker_1     | [2019-01-22 11:39:20,693: DEBUG/ForkPoolWorker-1] my_task_name[48d706d7-0d92-43aa-aa9d-d5db8d660af8]: Processing 51
worker_1     | [2019-01-22 11:39:21,138: DEBUG/ForkPoolWorker-1] https://example.com:443 "GET /api/detail/51 HTTP/1.1" 200 None
worker_1     | [2019-01-22 11:39:21,197: INFO/ForkPoolWorker-1] Task my_task_name[48d706d7-0d92-43aa-aa9d-d5db8d660af8] succeeded in 1.9656380449960125s: None

But I have 2 problems with this:

  1. I don't want to use celery logger inside the service. The service can be used even in environments where Celery is not installed at all (then it's fine that the task name and uuid is not included in logs)

  2. Logs from external libraries executed during the same task do not use the same logger, therefore do not include the task name and uuid in logs.

Which leads me to this question: Is it possible to specify (force) logger at the task level (in tasks.py) that will be used no matter how do I log in my service or how do external libraries log? Something like this would be fine:

tasks.py:

@shared_task
def synchronize_stuff():
    logging.enforce_logger(get_task_logger(__name__))
    stuff = some_service.synchronize()
    logging.restore_logger()

Also it might be worth noting that I use Django in the project.

Thanks!

1
If you can I world suggest to use django-background tasks which is much easier than celery here look at my answer here: stackoverflow.com/questions/54225303/…Ahtisham
@Ahtisham we actually use other Celery features like scheduler, periodic tasks and Flower for monitoring. This is only a minor issue that when solved would make debugging easier, but it is not a deal-breaker.Martin Janeček

1 Answers

0
votes

it is not exactly what you are looking for. But I had a similar problem and solved it with a logging filter that I applied on the handler that logs to the service where I do not want the celery logging messages. I describe my Problem and my solution in this question: How can I log from my python application to splunk, if I use celery as my task scheduler?

Tell me if this points in the right direction...

Also, I had very good results by using the python logging.dictConfig!