16
votes

I have a periodic task that I am implementing on heroku procfile using worker:

Procile

web: gunicorn voltbe2.wsgi --log-file - --log-level debug
worker: celery -A voltbe2 worker --beat -events -loglevel info 

tasks.py

class PullXXXActivityTask(PeriodicTask):
    """
    A periodic task that fetch data every 1 mins.
    """
    run_every = timedelta(minutes=1)

    def run(self, **kwargs):
        abc= MyModel.objects.all()
        for rk in abc:
            rk.pull()
        logger = self.get_logger(**kwargs)
        logger.info("Running periodic task for XXX.")

        return True

For this periodictask, I need the --beat (I checked by turning it off, and it does not repeat the task). So, in some way, the --beat does the work of a clock (https://devcenter.heroku.com/articles/scheduled-jobs-custom-clock-processes)

My concern is: if I scale the worker heroku ps:scale worker=2 to 2x dynos, I am seeing that there are two beats running on worker.1 and worker.2 from the logs:

 Aug 25 09:38:11 emstaging app/worker.2: [2014-08-25 16:38:11,580: INFO/Beat] Scheduler: Sending due task apps.notification.tasks.SendPushNotificationTask (apps.notification.tasks.SendPushNotificationTask)
Aug 25 09:38:20 emstaging app/worker.1: [2014-08-25 16:38:20,239: INFO/Beat] Scheduler: Sending due task apps.notification.tasks.SendPushNotificationTask (apps.notification.tasks.SendPushNotificationTask) 

The log displayed is for a different periodic task, but the key point is that both worker dynos are getting signals to do the same task from their respective clocks, while in fact there should be one clock that ticks and after every XX seconds decides what to do, and gives that task to the least loaded worker.n dyno

More on why a single clock is essential is here : https://devcenter.heroku.com/articles/scheduled-jobs-custom-clock-processes#custom-clock-processes

Is this a problem and how to avoid this, if so?

1

1 Answers

23
votes

You should have a separate worker for the beat process.

web: gunicorn voltbe2.wsgi --log-file - --log-level debug
worker: celery -A voltbe2 worker -events -loglevel info 
beat: celery -A voltbe2 beat 

Now you can scale the worker task without affecting the beat one.

Alternatively, if you won't always need the extra process, you can continue to use -B in the worker task but also have a second task - say, extra_worker - which is normally set to 0 dynos, but which you can scale up as necessary. The important thing is to always keep the task with the beat at 1 process