Managing Microservices with Service Mesh: A Control Plane for your Application

Written by haridas | Published 2020/07/01
Tech Story Tags: service-mesh | kubernetes | microservices | orchestration | productivity | coding | sre | containers

TLDR Monoliths and microservices are difficult to manage due to the complexity of the code. Microservices are a way to manage complex systems, not just monoliths. It is possible to manage multiple services in the same way as a single monolith or n-tier system. It can be hard to manage as many as possible, but it can be managed by a team of experts. The best way to scale is to use multiple services is to have multiple services working together, not to have a single system.via the TL;DR App

Applications built on monolithic/ 3-tier/ n-tier architecture often fail to meet the market demands in terms of scaling and performance. This is generally attributed towards the inflexible nature of these architectures, where code base becomes unmanageable due to various reasons - like addition of new features, identifying dependencies, and side effects that could crop up due to scaling, etc. In these environments, adopting new technologies and making changes take a long time. The bottom line is that they are less agile and ancient. 
Microservice architecture is believed to be the rescuer, where business logic is handled by separate services. It helps overcome the issues faced by the likes of monoliths (where business and system logic are bundled together) by splitting the application into multiple small components where each handles specific tasks and exposes standard APIs. This helps to focus on the hotspots in your application and easily enable horizontal scaling too, if required. Having said that, it is not as simple as it looks to manage a microservice architecture.

A glimpse of monoliths, microservices and anything in between 

Before delving deeper into managing microservices, let us take a look at what monoliths and microservices are capable of, and take stock of their pros and cons.

Monolithic architecture

  • 1. Monolithic applications keep its entire business logic in one single code base. 
  • It is deployed as a single entity or service.
Pros: Low operational complexity. Holds good during the initial phase of application development, where few components are sufficient.
Cons: Scaling the capacity (horizontal scaling) of the application is a challenge, as it involves handling multiple instances of a large code base. Increasing the development team size is another challenge, because it is hard for new members to understand the complexities of existing code.
An enhanced version of the monolith can be called as the n-tier application, where vertical and horizontal scaling is possible. However, there are bottlenecks at the database (DB) and load balancer (LB) levels.

Microservices 

  • Microservices are a natural evolution from n-tier applications.
  • The components are segmented in such a way that there is no need to touch all of them while making changes to a specific application. 
  • The modern operational techniques brings down complexities involved in managing multiple microservices and progressive updates, zero downtime updates, etc.
Pros: Each microservice can scale individually based on its demand. The development team can parallely work on their competent areas and rollout services. This is highly favorable for horizontal scalability and better resource utilization.
Cons: Complex operational requirements to manage the system. It requires strong visibility to manage the entire system.

Managing microservices in a modern application 

Currently, most of the modern applications are microservice based and they might be dependent on other SaaS, PaaS systems too. Key components of this architecture include:
  1. Technology agnostic frontend components (web, mobile or other clients)
  2. Authentication APIs
  3. Different services level APIs

Operational complexities

Microservice based application is the way to build modern applications due to its flexibility in scaling up and other resource utilization benefits. But, when it comes to operational requirements, it gets complex as it has multiple moving parts in it. It involves taking care of all the moving components, their release and upgradation, and at the same time ensuring the health of the components as well. These factors directly lead to increased complexities while scaling, as the dependencies too increase. The major complexities arise in:
  • Packaging
  • Managing heterogeneous environments 
  • Continuous integration and incremental rollouts
These are also considered as standard operational requirements related to the rollout of an application in a microservice environment. While Docker and similar container technologies help overcome the heterogeneous environments, platforms like Kubernetes provide the required consciousness integration instruments to simplify the complexities. 
To get a good understanding of the system and make proactive decisions, the Site Reliability Engineer (SRE)  needs to monitor and measure the factors given below in a production environment:
  1. Metrics
  2. Request Per Seconds (RPS) from different services
  3. Data volume per services
  4. Request Failure vs. success rate
  5. Transparent security ( TLS/SSL)
  6. Zero-downtime rollout 
  7. Intelligent load balancing
  8. Service discovery
  9. Retry / Timeout implications
  10. A/B testing for different services
  11. Visibility into service latency
  12. Distributed tracing
  13. Circuit breaker
  14. Retry storm
Some of the items listed are handled during the application development itself. For example, enabling SSl for ensuring a secure communication to a service is done at the development stage. Here the control is with the developer. Non-adherence of the standards specified by the security team becomes a weak point in the system. If the operational person gets full control over the security, that would be a clean method, as it is an important task for the operational person rather than the developer.
Similarly, it is possible to bring the control of all the items listed above under the operations team by abstracting it via tools. That is exactly what a service mesh does.

Service mesh

Service mesh tries to tap in and solve most of the SRE problems. It provides full visibility into the production systems, based on which an SRE can make instrumentation or proactive decisions to scale up or down or take other key actions to sustain the SLA agreements or other objectives specific to your application. All these are possible without changing the application code or business logic. 
In this type of environment, service developers need not worry about ensuring the security of the ingress and egress requests, as it’s already taken care of by the service mesh. Similarly, the cluster aware load balancing, service discovery, etc. are also taken care of. Taking off all these complexities or platform awareness requirements from the service developer’s hands makes them more productive and helps them in concentrating on business logic. This is what a service mesh does - offering a bunch of proxies which can be used by services to abstract the network requirements. The proxies or the components of the service mesh are described below.
Control and Data plane
Service mesh has two main components and proxies - control plane, data plane and sidecar. This separation is based on its responsibilities. 
Sidecar
Sidecar, as the name implies, is a proxy that behaves like a sidecar in a motorcycle. These proxies or sidecars are deployed at the infrastructure layer level and enable the services to route their request through them instead of reaching the network layer directly. These sidecars carry out all the actions required for the ingress and egress traffic from a given application. It follows the rules provided by the service mesh’s control plane. They are mainly responsible for service discovery, health of the services, routing requests, authentication and authorization of requests, load balancing, and observability.
Now, you can think of data plane as the worker who does the actual magic on the ground. The sidecar/ proxy running along with a service is the data plane of the service mesh. The control plane helps to manage the data plane and give required instructions based on operational requirements. Also, control plane supplies the required management tools to collect and visualize the metrics and dynamically do configuration changes, if required. Basically, a control plane can offer a full view of what’s happening in the system.
The control plane components have to be run separately to manage all the sidecars. So, on a cluster there will be one control plane and N number of data planes to match the number of services. In other words, every replica of a service will have an accompanying side car with it.
See the high-level view of your application stack with service mesh. Some of the service meshes support outside Kubernetes environments, but it’s more favourable to use Kubernetes as it gets all the instrumentation to manage operational pipelines.
Now if you zoom into your application further, you can see where the sidecar and control plane runs. As mentioned, you can see that every instance of the service will have a proxy or sidecar associated with it to manage the ingress and egress traffic, and a control plane to give instructions to the sidecar.
Network architecture

Key Providers

Istio and Linkerd are the two major service meshes available in the market. Istio democratised the concept of service mesh and showcased its importance in the microservice environments. Istio is backed by Google, Lyft and IBM. On the other side, Linkerd is a simpler version of Istio, which is a Cloud Native Computing Foundation’s (CNCF) project. It has started gaining traction.

Control plane for your application

It is a widely known concept to use service mesh for managing microservices in an application. Though service mesh is not generally used outside Kubernetes, treating the service mesh as a control plane at the application level will certainly take off the developer burden in terms of:
  1. Identifying the service dependencies
  2. Handling request retries (retry storm scenarios)
  3. Dealing with request timeout
  4. Making decisions about enabling HTTPS/TLS transparent to microservice.
  5. Handling rate limit of a service
  6. Performing A/B testing
  7. Metrics collection
  8. Dynamic load balancing rules based on the system metrics.
Long story short, service mesh can be used to manage and run the application in a production environment and make proactive decisions and take the right type of actions based on the system behavior. It even helps in tracing and debugging microservice APIs. The service mesh has a lot of features to manage, and offer better visibility into the production system activities. In this perspective, the service mesh becomes the control plane for an application.
These kinds of services are not new, they existed even before the concept of service mesh. But, they were all very tightly coupled and specifically built for some particular microservice environments. Now with service mesh, the common parts are brought out so that it can be reused in any microservice environment without much friction.

References

  1. https://servicemesh.io
  2. https://istio.io/
  3. https://linkerd.io/
  4. https://landing.google.com/sre/sre-book/toc/

Written by haridas | Architect at imaginea.com specialised in ML, Data Engineering and MLOps.
Published by HackerNoon on 2020/07/01