How we built a multicloud cross-cluster system with Azure Arc
Our clients are aware of the benefits offered by cloud solutions, such as scalability, flexible pricing, security by design, automatic updates and so on. Unfortunately, though, migrating to the cloud is not always an option. Organizations may be apprehensive of single cloud vendor lock-in, process sensitive data constrained by local regulations or may have already invested in on-prem infrastructure. Fortunately, there are solutions that minimize these disadvantages and increase benefits of cloud migration. Let’s talk about multicloud solutions on basis of one of our engagements.
Multicloud architecture allows using resources such as compute or storage from multiple cloud providers from a single/central location. Federating public and private clouds enables us to build hybrid solutions which strategically increase flexibility and diversification, as well as provide more options in risk management.
Vendor lock-in is minimized, as organizations finally have the option to move workloads between vendors, costs can be optimized across a wider selection of competing platforms, disaster recovery environments can be deployed on infrastructure that is independent from the main provider. Hybrid solutions allow to deploy cloud technologies in scenarios where classified or sensitive data cannot be processed outside the company network or must be stored only in a given location enforced by company policies, compliance or local regulations.
Recently, we have designed an e-commerce platform that is built on top of a microservices architecture and is deployed across two Kubernetes clusters. We have used Azure Arc for maintaining clusters deployed in Azure and Google Cloud Platform. Thanks to Azure Arc, it is possible to extend it by any other configured Kubernetes cluster, located in any cloud provider, region – or even on-prem.
Application workloads are deployed in both clusters and are orchestrated separately by individual Kubernetes Control Planes. The main entry point to the application currently uses Front Door, but this could be any gateway that is able to route traffic to Kubernetes Ingress. At this point, the load is balanced between clusters based on the applied high availability strategies such as active/active or active/passive, or might be a key component in deployment strategies such as Blue/Green, A/B testing or Canary Releases.
As you can see on the above diagram, clusters are not entirely isolated. The communication layer has been created using Service Mesh that is also responsible for telemetry and internal traffic management.
In the following chapters, we will explain our solution’s components and what their role is. We will also briefly discuss how they could help in mitigating some common challenges.
Orchestrating workloads – Managed Kubernetes
Kubernetes is currently the most popular container orchestrator on the market – the de-facto industrial standard when considering development of microservices-based solutions. Being able to schedule workloads across a cluster of multiple compute machines fits perfectly in the multicloud landscape.
Instead of deploying the solution in a single cluster installed across VMs of multiple providers, we have decided to make use of managed versions of clusters. Most vendors offer them at a good price, they can be automatically upgraded, scaled and they are integrated out of the box with vendor services such as load balancers.
Kubernetes clusters management – Azure Arc
When an organization chooses multiple smaller clusters instead of a single big one, maintaining them becomes a new challenge. We have chosen Azure Arc Kubernetes to maintain multiple Kubernetes clusters from a single location. It allows us to register Kubernetes clusters from different vendors, as well as hosted on-premise, and maintain them in a same fashion as other Azure resources.
In the following sections of this article, we will cover some of the benefits that we have gained from this integration: GitOps, Telemetry or Policies.
Communication – Service Mesh
We have decided to use Istio as a Service Mesh, but any other vendor could be used in its place. We use it mainly for maintaining secure communication between PODs and internal traffic management. With Service Mesh, organizations can improve security, productivity, reliability and observability.
Internal POD communication allows for an asymmetric architecture, where each cluster can contain only subsequent set of microservices that is specific to the cluster location. For example, microservices working on sensitive data can work only in Kubernetes deployed on-prem. They have access to storage available in local network and expose APIs with aggregated data that can be used by microservices deployed on the public cloud.
Istio provides a central place for configuring routes, timeouts, retries, requests throttling, and circuit breaker rules that must be considered in a high availability microservice environment. It allows to offload this requirement from developers who now can focus on maintaining the business logic, leaving the duty to Ops.
Deployment and configuration maintenance
Application deployment becomes very challenging if we consider multicloud architecture. We found the GitOps approach to be very simple, but at the same time, very flexible.
GitOps comes from the idea that all configurations and management of Kubernetes resources originate from a single source managed via a version control system, such as Git. All configuration files need to manage resources such as PODs, Services, Deployments, Ingress Controllers, Volumes and others that are defined declaratively as a code in the repository. Deployment as a code has a number of advantages:
- Changes can be versioned, which is ideal for auditing purposes, e.g. who introduced a given change in the configuration, when and why, with Git blame feature,
- Changes can be controlled via Pull Request constraints when the organization would like to restrict deployments with minimal number of approvers to certain environments,
- Deployment can be compared against different environments or points in time, for example between tagged releases.
To start with GitOps on any Kubernetes cluster, a project must be created on any available Git repository that contains deployment descriptors. The repository is periodically checked for changes via a Flux agent installed on a given cluster. If a change is detected, the Flux agent applies changes and updates resources in which the change has been made.
Configuration is very flexible and allows us to customize the process to individual project needs from automated Continuous Deployment for a single microservice to manual process with four eye check policy on many environments and many applications in the mono-repo layout.
It’s a great alternative to many custom pipelines running on Jenkins or dozens of maintained Ansible playbooks.
Telemetry and logs
Development and deployment are just first step in the product lifecycle. First line of support monitors resources, services and processes that are strategic to the business. Theirs role is to prevent or mitigate occurred issues according to the operational framework. In this process correct tools plays crucial role.
Arc integration aggregates metrics and logs from registered Kubernetes clusters. Insights service contains dashboards with basic information about cluster condition like CPU, memory, Nodes, Controllers, PODs from the first time it has been registered, which we find super easy.
It allows to monitor cluster condition and Kubernetes configuration state which could be useful for monitoring deployment progress in a given environment.
In the microservice environment, operators but also developers must be able to browse logs from multiple PODs. There are existing well-opinionated tools or even whole stacks that could be considered, however Azure Arc allowed us to organize log workspace in almost no time utilizing Azure Log Analytics. The installed agents automatically aggregate logs from all PODs and centralize them in a single location. Some of the logging levels require adjustment as Azure Log Analytics charges for data ingestion.
This approach enables preparing queries that work across resource groups or even subscriptions. Using a single query engine, operators can request logs from microservices deployed on Kubernetes, but also from load balancers, network events or policy events.
It is easy to use when collecting logs are associated with some UUID like session ID, exception ID or request ID. For example, when the end user encounters an exception that is associated with a request ID, it only takes seconds to extract the whole end-to-end work log for analysis and detection of the root cause.
Azure Log Analytics is well integrated with other Azure components that complete the operational toolbox of the multicloud solution. It is possible to define alerts based on metrics or activity log or custom log query across local and remote clusters:
A flexible environment comes with the risk of lacking consistency in the long run. The bigger the organization, the bigger the probability. Defining, implementing, monitoring and auditing standards, policies and rules are part of the process which is very often enforced by internal or external regulations.
Our solution mainly uses Azure platform and Azure Policies that help define standards as a set of programable policies. Azure Arc Policy uses Open Policy Agent as the policy engine and Gatekeeper agent installed on the Kubernetes cluster. It means that all policies are portable, and the solution is still resilient to vendor lock-in.
Azure Policies can be applied across Azure resources in the same consistent way including Kubernetes clusters and theirs resources. They can be organized hierarchically and combined using Azure Management Groups which acts as a container for resources across subscriptions. Automating policy enforcement ensures consistency, lowers development latency through immediate feedback, and helps with agility by allowing developers to operate independently without sacrificing compliance. Policies can affect resources in various ways. One of the many possibilities is to abort resource creation or just mark a resource as compliant/non-compliant.
Open Policy Agent is a framework for managing cloud-native environments using single policy definition language. Having an add-on agent installed on a cluster, a policy created through Azure is translated to a Kubernetes specific language and deployed as ConstraintTemplates and Contraints.
Azure portal presents a set of predefined Kubernetes oriented policies, as well as policies applicable to all other resources.
Maintainers can restrict deployment of containers based on image name, can enforce correct resource tagging policy or prevent from using privileged containers. We find it very easy to maintain organization structure by isolating workstreams within agreed namespaces. Security officers can enforce ingress HTTPS, authorized IP ranges, or using RBAC on services, although organizations are not limited to the given list as they can define and maintain their own policies using a provided DSL.
Cloud adoption is not a trivial task especially for non-green field projects. Multicloud solutions address a number of challenges, as well as the risks associated with using a single vendor or public clouds.
We have learned that a multicloud solution can be built in multiple ways, but we believe that we have achieved our primary goal. We have delivered a solution based on microservice architecture that can be spanned across multiple cloud providers. It is scalable and can be extended to on-premise infrastructure which could be a good starting point for demanding modern distributed platforms. On the other hand, this strategy is very difficult and is not a silver bullet for all cases.
We have used open-source technologies such as Kubernetes, Istio, Flux, OpenPolicy Agent that are associated with Cloud Native Computing Foundation, rendering the solution portable and compatible with industry standards.
Azure Arc embraces some of the mentioned technologies and helps integrate them with the rest of the Azure platform with a simple installation script. It helps solve real challenges with minimal entry barrier. Azure Arc is still available as a preview service on Azure platform, but it’s already a fully featured set of tools for building multicloud solutions.