Presto Worker Graceful Shutdown

Question

As part of effort to auto-scale our presto cluster, we like to graceful shutdown a presto worker before let EC2 terminate it. After following command

curl -v -XPUT --data '"SHUTTING_DOWN"' -H "Content-type: application/json" http://250.0.46.167:8081/v1/info/state

The worker log indicates "com.facebook.presto.server.GracefulShutdownHandler Shutdown requested" immediately and soon the node in coordinator change to "shutting_down" state. The worker process finally exit after 4 minutes (due to double grace period of 2 minutes instead of pending query).

So far so good, however as expected for any managed daemon. The worker process is immediately restarted, and soon the node is back to "active" in coordinator.

We wish the graceful shutdown in Presto roughly work like below: shutdown request will be sent to the coordinator (instead of worker). coordinator tells the worker to shutdown and later logically removes it from active node list. If the worker restarts and register back, coordinator will ignore it for the next hour.

I wonder how current Presto clusters owner/operators handle this issue?

Great idea. I created github.com/prestodb/presto/issues/12090 — danzhi

Thiago Dantas Thiago Dantas · Accepted Answer · 2020-12-23T22:29:09

What I did was after the PUT request i kept polling the output of the GET on the same resource with 1 second interval and poweroff the instance as soon as it eithers fail to respond (service down) or returns ACTIVE (service restarted)

1 second is not nearly enough for the service to come fully online again after shutting down gracefully so this has been working fine for me

Presto Worker Graceful Shutdown

2 Answers