Skip to content

loadbalancer controller

Complete EKS cluster [Terraform]

Getting started with creating a functional EKS cluster from scratch can be challenging as requires some specific settings. While EKS module will create a new cluster, it does not address how you will expose an applicaition, tags required for subnets, number of pod IP addresses etc

🖥 EKS cluster using terraform contains everything required for you to spin up a new cluster and expose application via Application Loadbalancer. All you need to do is apply terraform code

Source Code, Sample app

  • [x] VPC with 2 private and 2 public zones
  • [x] EKS cluster with Managed NodeGroup (1 Node)
  • [x] VPC CNI add-on with prefix delegation
  • [x] AWS Loadbalancer controller

EKS: avoid errors and timeout during deployment (ALB)

Scenario

Eks cluster configured with Application loadbalancer. During deployments, pods become unhealthy in target group for short while and causes brief outage.

target group status

Root cause

There are 2 possible reasons for this scenario and both must be addressed. 1. ALB taking longer to initialize new pods 2. ALB is slow to detect and drain terminated pods.

Solution

Enable pod readiness Gate

Configure Pod readiness Gate to indicate that pod is registered to the ALB/NLB and healthy to receive traffic. This will ensure pod is healthy in target group before terminating old pod.

To enable Pod readiness Gate, add label elbv2.k8s.aws/pod-readiness-gate-inject: enabled to applications Namespace. Change will be effective for any new pod being deployed.

kind: Namespace
metadata:
  labels:
    elbv2.k8s.aws/pod-readiness-gate-inject: enabled

Pod lifecycle preStop

When a pod is terminated, it can take couple of seconds for ALB to pick up the change and start draining connection. By this time, most likely pod already been terminated by K8s. Solution to this issue is a workaround. Add a lifecycle policy to the pod to ensure pods are de-registered before termination

    spec:
      terminationGracePeriodSeconds: 60
      containers:          
          lifecycle:
             preStop:
               exec:
                 command: ["/bin/sh", "-c", "sleep 60"]

Adjust ALB/TG De-registration time to be smaller than lifecycle time by adding annotation de-registration_delay.timeout_seconds

ingress:
  enabled: true
  className: "alb"
  annotations: 
    alb.ingress.kubernetes.io/scheme: internet-facing
    kubernetes.io/ingress.class: alb
    alb.ingress.kubernetes.io/listen-ports: '[{"HTTP": 80}, {"HTTPS": 443}]'
    alb.ingress.kubernetes.io/ssl-redirect: '443'
    alb.ingress.kubernetes.io/healthcheck-protocol: HTTP
    alb.ingress.kubernetes.io/target-type: ip
    alb.ingress.kubernetes.io/target-group-attributes: deregistration_delay.timeout_seconds=30