Worker timeouts when setting up New relic with Tornado 4 and Gunicorn

Question

I am trying to configure new relic with my gunicorn + tornado 4 app.

Locally, without gunicorn (and simply using tornado as the WSGI server), the new relic setup works and I can see data in new relic. I am using the following code to configure the new relic agent:

config_file = os.environ.get('NEW_RELIC_CONFIG_FILE', None)
if config_file:
  import newrelic.agent
  environment = 'production' if IS_PROD else 'development'
  newrelic.agent.initialize(config_file, environment=environment)

However, in production, with gunicorn, I get indefinite worker timeouts:

gunicorn -b 0.0.0.0:8080 -w 3 -p gunicorn.pid -k tornado --access-logfile /var/log/gunicorn_access.log --error-logfile /var/log/gunicorn_error.log myapp.server:make_application\(\) -t 2 --log-level DEBUG --capture-output &> /dev/null &

...

[2017-01-17 05:16:37 +0000] [26957] [CRITICAL] WORKER TIMEOUT (pid:26985)
[2017-01-17 05:16:37 +0000] [26957] [CRITICAL] WORKER TIMEOUT (pid:26986)
[2017-01-17 05:16:37 +0000] [26957] [CRITICAL] WORKER TIMEOUT (pid:26987)
[2017-01-17 05:16:37 +0000] [26991] [INFO] Booting worker with pid: 26991
[2017-01-17 05:16:37 +0000] [26992] [INFO] Booting worker with pid: 26992
[2017-01-17 05:16:37 +0000] [26993] [INFO] Booting worker with pid: 26993
[2017-01-17 05:16:40 +0000] [26957] [CRITICAL] WORKER TIMEOUT (pid:26992)
[2017-01-17 05:16:40 +0000] [26957] [CRITICAL] WORKER TIMEOUT (pid:26993)
[2017-01-17 05:16:40 +0000] [26957] [CRITICAL] WORKER TIMEOUT (pid:26991)
[2017-01-17 05:16:40 +0000] [26997] [INFO] Booting worker with pid: 26997
[2017-01-17 05:16:40 +0000] [26998] [INFO] Booting worker with pid: 26998
[2017-01-17 05:16:40 +0000] [26999] [INFO] Booting worker with pid: 26999

If I comment out the agent code above and run the gunicorn command, workers are stable and don't timeout.

Despite setting log level to DEBUG, I cannot find the root cause of why a gunicorn worker is timing out and rebooting indefinitely. All I know is that the new relic agent code above is the culprit.

Since I am able to successfully integrate with New Relic locally, I suspect my newrelic.ini and the new relic agent code above is fine. Gunicorn is somehow messing things up, but now sure how or where I should begin to troubleshoot.

I am using:

newrelic==2.78.0.57
gunicorn==19.6.0
tornado==4.4

deruse deruse · Accepted Answer · 2017-01-17T05:38:52

Wow, it ended up being a memory issue. When I spawn 1 worker instead of 3, everything works. New relic instrumentation was just barely tipping my memory usage over the edge.

Worker timeouts when setting up New relic with Tornado 4 and Gunicorn

1 Answers