Three Patterns for an Effective Cloud Native Development Workflow

Many developers are moving towards “cloud native” development, whether that is taking advantage of the services and convenience of public cloud, or deploying services on their own internal cloud. However, the new architectures and technologies that are emerging as part of the cloud native development space — microservices, containers, orchestrators — require new developer workflow patterns.

In this article I will introduce three patterns that I have found useful as I’ve learned about working with cloud technologies over the past several years.

Creating an Efficient Inner-Development Loop

The ability to define cloud infrastructure as code and provision this on-demand has been revolutionary in the way we deploy software. However, although the initialisation of the infrastructure is fast, it is typically not instantaneous (as you might want, say, in a TDD cycle). This means that developers who require provisioning of infrastructure in order to complete a build and deploy cycle often do not get the fast feedback they require. This can lead to task/context switching becoming a problem. The solutions to this include simulated local development infrastructure, re-usable remote infrastructure, and local-to-production development.

The simulated local development infrastructure pattern can be seen with AWS SAM Local. This tool provides a CLI and Docker-based replication of a production serverless environment in order to enable the efficient local development of AWS Lambda functions that use associated AWS services such as Amazon API Gateway and DynamoDB. This tool can be further augmented with service virtualisation (covered below) to simulate services, and in-memory data stores and middleware, for example LocalStack, which can be used to provide simulations of AWS services like Kinesis and SQS.

The re-usable remote infrastructure pattern is often implemented in a bespoke fashion, with platform teams provisioning multiple test environments that can be leased on-demand by engineers. Typically the configuration and corresponding state (data stores) are reset when the lease is completed, which make this ready for use by the next developer. The open source Kubernaut tool also provides the same experience for Kubernetes, and maintains a collection of initialised clusters that can be leased on-demand.

The local-to-production development pattern is arguably the most cloud native of the patterns, as this involves a developer coding an application against production. Development and test environments must be as high-fidelity as possible in order to get the most accurate feedback, and obviously the most production-like environment there is is production itself. Azure provides dev spaces, which allows an engineer to spin up a managed Kubernetes cluster on-demand and connect a local VSCode editor into this. The tool manages the build and deploy of any code changes into a container, which is then deployed into the dev space in near real time.

The CNCF-hosted Telepresence tool allows developers to proxy their local development environment into a Kubernetes cluster, which allows an engineer to run and debug any code and application locally, as if it was in the cluster. This allows a real-time developer feedback loop, as requests can be made against the production application and the service debugged locally using actual traffic (or shadowed traffic) that is forwarded to the local development environment.

Isolating the Scope of Testing: Service Virtualisation and Contracts

Cloud native systems are typically developed as modular (service-based) systems, which means testing a single service can be challenging due to the required interaction with external service dependencies. Obviously services should be designed to be as cohesive and loosely coupled as possible, which means that the can be developed in isolation. However, when this is not practical, or an engineer wants to drive a more production-like test, techniques like service virtualisation and consumer-driven contracts can be useful patterns.

Modern service virtualisation tools like Hoverfly, WireMock and Mountebank act as proxies that sit between services and systems and capture traffic for later replay. This allows for the execution of tests that span multiple services, and the recording of the associated requests and responses from the dependent services involved. The recording can then be replayed without running the actual dependencies themselves, which is very valuable for running isolated tests in a CI/CD build pipeline. Both of these tools can also be used to generate virtual responses from services that do not yet exist, and Hoverfly allows the injection of faults, which can be used to test the handling of failures in a deterministic fashion.

Consumer-driven contracts (CDC) can also be used to not only drive the design of services outside-in (i.e. TDD for APIs), but also for verifying that a service provides the required functionality and doesn’t regress as the service evolves. There is an excellent article about this on the Martin Fowler Blog, and although the process can appear daunting at first glance, in my experience it does become quite mechanical once a team have experimented with the approach over a few iterations.

Validating Functionality with Canary Releases

Cloud native systems are complex and constantly evolving, and so testing in pre-production typically cannot provide complete validation of functionality and interactions with what is currently running in production. The solution to this problem is to reduce the impact of deployment by canarying — initially routing only a small fraction of production traffic against the new service (the “canary in the coal mine”) and observing behaviour and other KPIs, and then incrementally increasing the percentage of traffic this until the new service is taking all of the traffic.

For developers working with Kubernetes, the open source Ambassador API gateway, which is built on the Envoy Proxy, provide canary testing functionality, which is driven via simple annotations on Kubernetes services. The Istio service mesh also provides canarying, but this has to be configured outside of Kubernetes. With a little glue code, both of these systems can provide an automated canary release of functionality, and an automated rollback if issues are detected.

For developers working with serverless code, similar functionality is offered by many of the cloud vendors. For example, AWS Lambda provides traffic shifting using function aliases, which can be orchestrated to provide a canary rollout. As with the Kubernetes approach above, a developer can write some glue code to automate gradual releases and rollbacks based on AWS CloudWatch metrics.