Top 6 Tips for Increasing the Reliability of Your Application in Kubernetes

Our previous article discussed the Top 5 DevOps Tools and Services to Consider as a Startup. While increasing the speed of your Software Development Life Cycle (SDLC), it's also vital to maintain code accessibility for the user.

If you've successfully sped up your SDLC, you must think about seamlessly updating code on your servers without impacting the user. It is also vital to understand how you can increase the availability of your application. This article serves as a checklist for anyone using Kubernetes or planning to integrate it. It's critical to provision your deployments to achieve a zero-downtime deployment properly, ensuring that your code is delivered seamlessly to the user and that your applications will self-heal in case of problems.

This article will discuss the six most important points to consider when deploying Kubernetes to achieve zero downtime deployment and increased availability.

Rolling Updates

Defaults using the RollingUpdate strategy, which involves incrementally updating pod instances with new ones to enable deployment without downtime. To customize this strategy, you can adjust the maxSurge, which defines the maximum number of pods you can add at a time, and the maxUnavailable, which determines the maximum number of pods unavailable during the rolling update from the deployment configuration.

To minimize disruptions to your workload, it's recommended to avoid using the Recreate type for all of them. When you use the Recreate Kubernetes deletes the old pod first and then creates a new one, which can result in downtime. This approach may be suitable if you use volumes to store data.

Declaration Resource

Declaring the minimum and maximum resources required for your deployment is best practice.

The Kubernetes scheduler uses this information to check resource usage and find a suitable node for your pods. If your pod’s specification doesn't indicate how many resources you require, your pod may be scheduled to an overused node. To avoid this, accurately specifying the minimum and maximum resources needed for your deployment is essential:

...
spec:
  containers:
    resources:
      requests:
        memory: "512Mi"
        cpu: "500m"
      limits:
        memory: "1Gi"
        cpu: "1"

ReplicaSet

Having a minimum of three ReplicaSets for any application deployment is advisable. This is because if one or two of the ReplicaSets fail, there will always be a third one that can remain up and running, ensuring that your application continues to function without downtime.

To run as many replicas as possible, it is essential to understand that your application must be ready for this. It must not store any state in itself. In other words, be stateless.

The more replicas you run, the better.

podAntiAffinity

Deploying multiple ReplicaSets does not guarantee that they will be deployed on different nodes, which can be problematic.

For example, your service will become unavailable if three replica sets are deployed on the same node and the node goes down. To prevent this, it's best to define the podAntiAffinity field in the deployment configuration file. For example, if you want to deploy nginx-ingress and want nginx-ingress to be on different nodes, you can declare a podAntiAffinity field in the deployment nginx-ingress.

Pod anti-affinity can be defined in two ways:

RequiredDuringSchedulingIgnoredDuringExecution In this case, the scheduler will always ensure that pods don't end up on the same node, and if there is not enough capacity on different nodes, it will fail to schedule any pod.

affinity:
  nodeAffinity:
    requiredDuringSchedulingIgnoredDuringExecution:
      nodeSelectorTerms:
      - matchExpressions:
        - key: topology.kubernetes.io/zone
          operator: In
          values:
          - antarctica-east1
          - antarctica-west1

PreferredDuringSchedulingIgnoredDuringExecution In this case, the scheduler will try to schedule the pods on different nodes, but it will not fail if it cannot find a suitable node.

affinity:
  nodeAffinity:
    preferredDuringSchedulingIgnoredDuringExecution:
    - weight: 1
      preference:
        matchExpressions:
        - key: app
          operator: In
          values:
          - nginx-ingress

Liveness and Readiness Probes

In Kubernetes, probes allow you to check the health and availability of your containers running in a pod. Kubernetes provides two types of probes: liveness probes and readiness probes.

Liveness Probes are used to determine if a container in a pod is still running. If the liveness probe fails, Kubernetes will kill and restart the container. This can be useful when the container is stuck in an unresponsive state.
Readiness Probes are used to determine if a container in a pod is ready to start serving requests. If the readiness probe fails, the container is removed from the pool of available endpoints to serve requests. This can be useful when the container needs to initialise before it can start requesting.

Both liveness and readiness probes are defined in the deployment configuration file for your Kubernetes pods. Using probes, you can ensure that your containers are always available and ready to serve requests, improving your applications' overall reliability and availability.

spec:
  containers:
    livenessProbe:
      httpGet:
        path: /live
        port: probe
      initialDelaySeconds: 5
      periodSeconds: 5
    readinessProbe:
      httpGet:
        path: /ready
        port: probe
      periodSeconds: 5

Auto Scaling

Auto-scaling allows Kubernetes to automatically adjust the number of replicas based on CPU utilization, memory usage, and other metrics (custom HPA). By implementing auto-scaling, you can ensure that your application can handle increased traffic without downtime.

Kubernetes supports two types of autoscaling:

Horizontal Pod Autoscaler (HPA) scales a specific pod's number of replicas based on CPU or memory usage. You can define the minimum and maximum number of replicas and the target CPU or memory utilization.
Vertical Pod Autoscaler (VPA) scales the resource limits and container requests based on their usage. For example, if a container uses more CPU than requested, VPA will automatically increase the container's CPU limit.

Autoscaling can be enabled by creating an auto-scaler object in your Kubernetes cluster and attaching it to a deployment, replica, or stateful set.

Once the autoscale is configured, Kubernetes will monitor the resource utilization of the pods and automatically adjust the number of replicas to maintain the target utilization. Using autoscaling, you can ensure that your application is always available and responsive, even under high loads or fluctuations in resource usage.

Conclusion

In conclusion, deploying an application on Kubernetes can significantly increase its reliability by leveraging the platform's built-in features to manage and scale containerized workloads. By following best practices such as using more replicas, defining minimum and maximum resources, and using rolling updates, you can ensure that your application is always available and responsive.

In addition, Kubernetes provides many tools and mechanisms, such as pod anti-affinity, probes, and autoscaling, that can further improve application reliability and availability. Using these features and constantly monitoring your application's performance and resource usage, you can proactively address potential issues and minimize the risk of downtime.

Overall, implementing these strategies and leveraging the features of Kubernetes can help you build robust and resilient applications that can scale with changing requirements and provide a positive user experience.