AWS ECS restart Service with the same task definition and image with no downtime

23

votes

I am trying to restart an AWS service (basically stop and start all tasks within the service) without making any changes to the task definition.

The reason for this is because the image has the latest tag attached with every build.

I have tried stopping all tasks and having the services recreate them but this means that there is some temporarily unavailable error when the services are restarting in my instances (2).

What is the best way to handle this? Say, A blue-green deployment strategy so that there is no downtime?

This is what I have currently. It'shortcomings is that my app will be down for a couple of seconds as the service's tasks are being rebuilt after deleting them.

configure_aws_cli(){
    aws --version
    aws configure set default.region us-east-1
    aws configure set default.output json
}

start_tasks() {
    start_task=$(aws ecs start-task --cluster $CLUSTER --task-definition $DEFINITION --container-instances $EC2_INSTANCE --group $SERVICE_GROUP --started-by $SERVICE_ID)
    echo "$start_task"
}

stop_running_tasks() {
    tasks=$(aws ecs list-tasks --cluster $CLUSTER --service $SERVICE | $JQ ".taskArns | . []");
    tasks=( $tasks )
    for task in "${tasks[@]}"
    do
        [[ ! -z "$task" ]] && stop_task=$(aws ecs stop-task --cluster $CLUSTER --task "$task")
    done
}

push_ecr_image(){
    echo "Push built image to ECR"
    eval $(aws ecr get-login --region us-east-1)
    docker push $AWS_ACCOUNT_ID.dkr.ecr.us-east-1.amazonaws.com/repository:$TAG
}

configure_aws_cli
push_ecr_image
stop_running_tasks
start_tasks

amazon-web-servicesdeploymentamazon-ecs

35

votes

Use update-service and the --force-new-deployment flag:

aws ecs update-service --force-new-deployment --service my-service --cluster cluster-name

10

votes

Hold on a sec. If I understood you usecase correctly, this is addressed in the official docs:

If your updated Docker image uses the same tag as what is in the existing task definition for your service (for example, my_image:latest), you do not need to create a new revision of your task definition. You can update the service using the procedure below, keep the current settings for your service, and select Force new deployment....

To avoid downtime, you should manipulate 2 parameters: minimum healthy percent and maximum percent:

For example, if your service has a desired number of four tasks and a maximum percent value of 200%, the scheduler may start four new tasks before stopping the four older tasks (provided that the cluster resources required to do this are available). The default value for maximum percent is 200%.

This basically mean, that regardless of whether your task definition changed and to what extent, there can be an "overlap" between the old and the new ones, and this is the way to achieve resilience and reliability.

UPDATE: Amazon has just introduced External Deployment Controllers for ECS(both EC2 and Fargate). It includes a new level of abstraction called TaskSet. I haven't tried it myself yet, but such fine grain control over service and task management(both APIs are supported) can potentially solve the problem akin this one.

5

votes

After you push your new image to your Docker repository, you can create a new revision of your task definition (it can be identical to the existing task definition) and update your service to use the new task definition revision. This will trigger a service deployment, and your service will pull the new image from your repository.

This way your task definition stays the same (although updating the service to a new task definition revision is required to trigger the image pull), and still uses the "latest" tag of your image, but you can take advantage of the ECS service deployment functionality to avoid downtime.

1

votes

The fact that I have to create a new revision of my task definition every time even when there is no change in the task definition itself is not right.

There are a bunch of crude bash implementations on this which means that AWS should have the ECS service scheduler listen for changes/updates in the image, especially for an automated build process.

My crude work-around to this was have two identical task definitions and switch between them for every build. That way I don't have redundant revisions.

Here is the specific script snippet that does that.

update_service() {
    echo "change task definition and update service"
    taskDefinition=$(aws ecs describe-services --cluster $CLUSTER --services $SERVICE | $JQ ".services | . [].taskDefinition")
    if [ "$taskDefinition" = "$TASK_DEF_1" ]; then
        newDefinition="$TASK_DEF_2"
    else
        newDefinition="$TASK_DEF_1"
    fi
    rollUpdate=$(aws ecs update-service --cluster $CLUSTER --service $SERVICE --task-definition $newDefinition)
}

0

votes

Did you have this question solved? Perhaps this will work for you.

With a new release image pushed to ECR with a version tag, i.e. v1.05, and the latest tag, the image locator in my task definition needed to be explicitly updated to have this version tag postfixed like :v1.05.

With :latest, this new image did not get pulled by the new container after aws ecs update-service --force-new-deployment --service my-service.

I was doing tagging and pushing like this:

docker tag ${imageId} ${ecrRepoUri}:v1.05
docker tag ${imageId} ${ecrRepoUri}:latest
docker push ${ecrRepoUri}

...where as this is the proper way of pushing multiple tags:

docker tag ${imageId} ${ecrRepoUri}
docker push ${ecrRepoUri}:v1.05
docker push ${ecrRepoUri}:latest

This was briefly mentioned in the official docs without a proper example.

-1

votes

Works great https://github.com/fdfk/ecsServiceRestart

python ecsServiceRestart.py restart --services="app app2" --cluster=test

-1

votes

The quick and dirty way:

login to EC2 instance running the task
find your container with docker container list
use docker restart [container]

AWS ECS restart Service with the same task definition and image with no downtime

7 Answers