1
votes

As part of effort to auto-scale our presto cluster, we like to graceful shutdown a presto worker before let EC2 terminate it. After following command

curl -v -XPUT --data '"SHUTTING_DOWN"' -H "Content-type: application/json" http://250.0.46.167:8081/v1/info/state

The worker log indicates "com.facebook.presto.server.GracefulShutdownHandler Shutdown requested" immediately and soon the node in coordinator change to "shutting_down" state. The worker process finally exit after 4 minutes (due to double grace period of 2 minutes instead of pending query).

So far so good, however as expected for any managed daemon. The worker process is immediately restarted, and soon the node is back to "active" in coordinator.

We wish the graceful shutdown in Presto roughly work like below: shutdown request will be sent to the coordinator (instead of worker). coordinator tells the worker to shutdown and later logically removes it from active node list. If the worker restarts and register back, coordinator will ignore it for the next hour.

I wonder how current Presto clusters owner/operators handle this issue?

2
I'd file an issue with the Presto project. - Dain Sundstrom

2 Answers

0
votes

What I did was after the PUT request i kept polling the output of the GET on the same resource with 1 second interval and poweroff the instance as soon as it eithers fail to respond (service down) or returns ACTIVE (service restarted)

1 second is not nearly enough for the service to come fully online again after shutting down gracefully so this has been working fine for me

0
votes

The graceful shutdown documentation includes all relevant details of the HTTP PUT, and how it works overall.

Typically cluster nodes a generally running for very long periods of time. The coordinator automatically stops scheduling to the worker in shutdown and removes it form the pool.