Automation of Draining ECS instances with Lambda, based on Autoscaling Group Lifecycle hooks or Spot Instance Interruption Notices
Based on the original idea from AWS Blog post and GitHub. With the following differences:
Autoscaling Hooks events are received via CloudWatch rules, which makes possible having one function for draining many ECS Clusters
Serverless Framework based
Written in Golang
Supports the draining of Spot based ECS instances via Spot Instance Interruption Notice
When updating AMI for the ECS instances then ASG replaces them without "Draining" , which may cause a short downtime of deployed containers. This function automates the ECS Cluster Instances Drain process.
ecs-drain-lambda function:
Receives CloudWatch event:
autoscaling:EC2_INSTANCE_TERMINATING
event should be configured on your ASG ) from CloudWatch Eventsor
Gets the ID of the instance that has to be terminated
Looks for the ECS Cluster name in the UserData in the following format: ECS_CLUSTER=xxxxxxxxx
If some ECS Tasks are running on the instance, starts the Drain
process
Waits for all the ECS Tasks to shutdown
Completes Lifecycle Hook, which lets ASG proceed with instance termination
GNU Make
Configured EC2 Auto Scaling Lifecycle Hooks for autoscaling:EC2_INSTANCE_TERMINATING
event on your ASG
Example CloudFormation resource:
ASGTerminateHook:
Type: "AWS::AutoScaling::LifecycleHook"
Properties:
AutoScalingGroupName: !Ref ECSAutoScalingGroup
DefaultResult: "ABANDON"
HeartbeatTimeout: "900"
LifecycleTransition: "autoscaling:EC2_INSTANCE_TERMINATING"
Clone the repo with git clone
Enter the project directory cd ecs-drain-lambda
Run make deploy
Note: by default us-east-1
region is selected, if you need to deploy it to the
different region you can use sls deploy -v --region ${AWS_REGION}
If you want to deploy ecs-drain-lambda function with Terraform, there is ecs-drain-lambda Terraform Module.
Function waits for 15 minutes for Drain to complete and fails with the timeout after
If function fails, then the default lifecycle hook action will be triggered (ABANDON
or CONTINUE
depending on your Hook configuration), either result will end up with eventual instance termination.
If the instance is terminating, both ABANDON and CONTINUE allow the instance to terminate. However, ABANDON stops any remaining actions, such as other lifecycle hooks, while CONTINUE allows any other lifecycle hooks to complete.