How to Autoscale an Amazon Elastic Kubernetes Service Cluster

In this article we are going to consider the two most common methods for Autoscaling in EKS cluster:

Horizontal Pod Autoscaler (HPA)Cluster Autoscaler (CA)

The Horizontal Pod Autoscaler or HPA is a Kubernetes component that automatically scales your service based on metrics such as CPU utilization or others, as defined through the Kubernetes metric server. The HPA scales the pods in either a deployment or replica set, and is implemented as a Kubernetes API resource and a controller. The Controller Manager queries the resource utilization against the metrics specified in each horizontal pod autoscaler definition. It obtains the metrics from either the resource metrics API for per pod metrics or the custom metrics API for any other metrics.

To see this in action, we are going to configure HPA and then apply some load to our system to see it in action.

To start with, let us start with installing Helm as a package manager for Kubernetes.

curl https://raw.githubusercontent.com/kubernetes/helm/master/scripts/get > helm.sh
 chmod +x helm.sh
 ./helm.sh

Now, we are going to set up the server base portion of Helm called Tiller. This requires a service account:

---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: tiller
  namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
  name: tiller
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: cluster-admin
subjects:
  - kind: ServiceAccount
    name: tiller
    namespace: kube-system

The above defines a Tiller service account to which we have assigned the cluster admin role. Now let's go ahead and apply the configuration:

kubectl apply -f tiller.yml

Run

helm init

using the Tiller service account we have just created:

helm init --service-account tiller

With this we have installed Tiller onto the cluster, which gives access to manage those resources within it.

With Helm installed, we can now deploy the metric server. Metric servers are cluster wide aggregators of resource usage data where metrics are collected by

kubelet

on each worker node, and are used to dictate the scaling behavior of deployments.

So let's go ahead and install that now:

helm install stable/metrics-server --name metrics-server --version 2.0.4 --namespace metrics

Once all checks have passed, we are ready to scale the application.

For the purpose of this article, we will deploy a special build of Apache and PHP designed to generate CPU utilization:

kubectl run php-apache --image=k8s.gcr.io/hpa-example --requests=cpu=200m --expose --port=80

**requests=cpu=200m - requesting 200 millicores get allocated to pod

Now, let us autoscale our deployment:

kubectl autoscale deployment php-apache --cpu-percent=50 --min=1 --max=10

The above specifies that the HPA will increase or decrease the number of replicas to maintain an average CPU utilization across all pods by 50%. Since each pod requests 200 millicores (as specified in the previous command), the average CPU utilization of 100 millicores is maintained.

Let's check the status:

kubectl get hpa

Review

Targets

column, if it says

unknown/50%

then it means that the current CPU consumption is 0%, as we are not currently sending any request to the server. This will take a couple of minutes to show the correct value, so let us grab a cup of coffee and come back when we have got some data here.

Rerun the last command and confirm that

Targets

column is now

0%/50%

. Now, let's generate some load in order to trigger scaling by running the following :

kubectl run -i --tty load-generator --image=busybox /bin/sh

Inside this container, we are going to send an infinite number of requests to our service. If we flip back over to the other terminal, we can watch the autoscaler in action:

kubectl get hpa -w

We can watch the HPA scaler pod up from 1 to our configured maximum of 10, until the average CPU utilization is below our target of 50%. It will take about 10 minutes to run and you could see we are now having 10 replicas. If we flip back to the other terminal to terminate the load test, and flip back to the scaler terminal, we can see the HPA reduce the replica count back to the minimum.

Cluster Autoscaler

The Cluster Autoscaler is the default Kubernetes component that can scale either pods or nodes in a cluster. It automatically increases the size of an autoscaling group, so that pods can continue to get placed successfully. It also tries to remove unused worker nodes from the autoscaling group (the ones with no pods running).

The following AWS CLI command will create an Auto scaling group with minimum of one and maximum count of ten:

eksctl create nodegroup --cluster <CLUSTER_NAME> --node-zones <REGION_CODE> --name <REGION_CODE> --asg-access --nodes-min 1 --nodes 5 --nodes-max 10 --managed

Now, we need to apply an inline IAM policy to our worker nodes:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Action": [
                "autoscaling:DescribeAutoScalingGroups",
                "autoscaling:DescribeAutoScalingInstances",
                "autoscaling:DescribeLaunchConfigurations",
                "autoscaling:DescribeTags",
                "autoscaling:SetDesiredCapacity",
                "autoscaling:TerminateInstanceInAutoScalingGroup",
                "ec2:DescribeLaunchTemplateVersions"
            ],
            "Resource": "*",
            "Effect": "Allow"
        }
    ]
}

This basically allows the EC2 worker nodes posting the cluster auto scaler the ability to manipulate auto scaling. Copy it and add to your EC2 IAM role.

Next, download the following file:

wget https://raw.githubusercontent.com/kubernetes/autoscaler/master/cluster-autoscaler/cloudprovider/aws/examples/cluster-autoscaler-autodiscover.yaml

And update the following line with your cluster name:

       - --node-group-auto-discovery=asg:tag=k8s.io/cluster-autoscaler/enabled,k8s.io/cluster-autoscaler/<YOUR CLUSTER NAME>

Finally, we can deploy our Autoscaler:

kubectl apply -f cluster-autoscaler-autodiscover.yaml

Of course we should wait for the pods to finish creating. Once done, we can scale our cluster out. We will consider a simple

nginx

application with the following

yaml

file:

apiVersion: extensions/v1beta2
kind: Deployment
metadata:
  name: nginx-scale
spec:
  selector:
    matchLabels:
      app: nginx
  replicas: 1
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx:1.14.2
        ports:
        - containerPort: 80
        resources: 
          limits:
            cpu: 500m
            memory: 512Mi
          requests:
            cpu: 500m
            memory: 512Mi

Let's go ahead and deploy the application:

kubectl apply -f nginx.yaml

And check the deployment:

kubectl get deployment/nginx-scale

Now, let's scale a replica up to 10:

kubectl scale --replicas=10 deployment/nginx-scale

We can see our some pods in the pending state, which is the trigger that the cluster auto scaler uses to scale out our fleet of EC2 instances.

kubectl get pods -o wide --watch

Conclusion

In this article, we considered both types of EKS cluster autoscaling. We learnt how the Cluster Autoscaler initiates scale-in and scale-out operations each time it detects under-utilized instances or pending pods. Horizontal Pod Autoscaler and Cluster Autoscaler are essential features of Kubernetes when it comes to scaling a microservice application. Hope you found this article useful but there is more to come. Till then, happy scaling!

About the author - Sudip is a Solution Architect with more than 15 years of working experience, and is the founder of Javelynn. He likes sharing his knowledge through writing, and while he is not doing that, he must be fishing or playing chess.

Previously posted at https://appfleet.com/.