A Deep Dive into Architecting a Kubernetes Infrastructure
So far in this series, we explored the various questions one might have when starting off with Kubernetes and its ecosystem and did our best to answer them. Now that justice has been done to clear the clouded thoughts you may have, let us now dive into the next important step in our journey with Kubernetes and the infrastructure as a whole.
In this blog, we will look at the best possible way to architect your infrastructure for your use case and the various decisions you may want to take depending on your constraints.
Your architecture hugely revolves around your use case and you have to be very careful in getting it right and take proper consultation if needed from experts. While it is very important to get it right before you start, mistakes can happen, and with a lot of research happening these days, you can often find any revolution happen any day which can make your old way of thinking obsolete.
That is why I would highly recommend you to Architect for Change and make your architecture as Modular as possible so that you have the flexibility to do incremental changes in the future if needed.
Let’s see how we would realize our goal of architecting our system considering a client-server model in mind.
The Entry Point: DNS
In any typical infrastructure (cloud native or not), a message request has to be first resolved by the DNS server to return the IP address of the server. Setting up your DNS should be based on the availability you would require. If you require higher availability, you may want to distribute your servers across multiple regions or cloud providers depending on the level of availability you would like to achieve.
Content Delivery Network (CDN)
In some cases, you might need to serve the users with minimum latency as possible and also reduce the load on your servers. This is where Content Delivery Network (CDN) plays a major role.
Does the client frequently request a set of static assets from the server? Are you aiming to improve the speed of delivery of content to your users while also reducing the load on your servers? In such cases, a CDN at edge serving a set of static assets might actually help to both reduce the latency for users and load on your servers.
Is all your content dynamic? Are you fine with serving content to users with some level of latency in favor of reduced complexity? Or is your app receiving low traffic? In such cases, a CDN might not make much sense to use and you can send all the traffic directly to the Global Load Balancer. But do note that having a CDN also does have the advantage of distributing the traffic which can be helpful in the event of DDOS attacks on your server.
CDN providers include Cloudfare CDN, Fastly, Akamai CDN, Stackpath and there is a high chance that your cloud provider might also offer a CDN service like Cloud CDN from Google Cloud Platform, CloudFront from Amazon Web Services, Azure CDN from Microsoft Azure and the list goes on.
If there is a request that cannot be served by your CDN, the request will next hit your load balancer. And these can be either regional with Regional IPs or global with Anycast IPs and in some cases, you can also use load balancers to manage internal traffic.
Apart from routing and proxying the traffic to the appropriate backend service, the load balancer can also take care of responsibilities like SSL Termination, integrating with CDN and even managing some aspects of network traffic.
While hardware load balancers do exist, software load balancers provide greater flexibility, cost reduction and scalability.
Similar to CDNs, your cloud providers should be able to provide a load balancer as well for you (such as GLB for GCP, ELB for AWS, ALB for Azure, etc.) but what is more interesting is that you can provision these load balancers directly from Kubernetes constructs. For instance, creating an ingress in GKE (aka GKE ingress) also creates a GLB for you behind the scenes to receive the traffic and other features like CDN, SSL Redirects, etc. can also be set up just by configuring your ingress as seen here.
While you should always start small, load balancers would allow you to scale incrementally having architectures like this:
Networking and Security Architecture
The next important thing to take care of in your architecture is the networking itself. You may want to go for a private cluster if you want to increase security. There you can moderate the inbound and outbound traffic, mask IP addresses behind NATs, isolate networks with multiple subnets across multiple VPCs and so on.
How you setup your network would typically depend on the degree of flexibility you are looking for and how you are going to achieve it. Setting up the right networking is all about reducing the attack surface as much as possible while still allowing for regular operations.
Protecting your infrastructure by setting up the right network also involves setting up firewalls with the right rules and restrictions so that you allow only the traffic as allowed to/from the respective backend services both inbound and outbound.
In many cases, these private clusters can be protected by setting up Bastion Hosts and tunneling through them for doing all the operations in the cluster, since all you have to expose to the public network is the Bastion (aka Jump host) — typically setup in the same network as the cluster.
Some cloud providers also provide custom solutions in their approach towards Zero Trust Security. For instance, GCP provides its users with Identity Aware Proxy (IAP) which can be used instead of typical VPN implementations.
Once all of these are taken care of, the next step to networking would be setting up the networking within the cluster itself depending on your use case.
It can involve tasks like:
- Setting up the service discovery within the cluster (which can be handled by CoreDNS).
- Setting up a service mesh if needed (eg. LinkerD, Istio, Consul, etc.)
- Setting up Ingress controllers and API Gateways (eg. Nginx, Ambassador, Kong, Gloo, etc.)
- Setting up network plugins using CNI facilitating networking within the cluster.
- Setting up Network Policies moderating the inter-service communication and exposing the services as needed using the various service types.
- Setting up interservice communication between various services using protocols and tools like GRPC, Thrift or HTTP.
- Setting up A/B testing, which can be easier if you use a service mesh like Istio or Linkerd.
If you would like to look at some sample implementations, I would recommend looking at this repository which helps users set up all these different networking models in GCP including hub and spoke via peering, hub and spoke via VPN, DNS and Google Private Access for on-premises, Shared VPC with GKE support, ILB as next hop and so on, all using Terraform.
And the interesting thing about networking in cloud is that it need not be just limited to the cloud provider within your region but can span across multiple providers across multiple regions as needed. This is where projects like Kubefed or Crossplane could help.
If you would like to explore more on some of the best practices when setting up VPCs, subnets and the networking as a whole, I would recommend going through this page, and the same concepts are applicable for any cloud provider you are onboard with.
If you are using managed clusters like GKE, EKS, AKS, Kubernetes is automatically managed, thereby lifting a lot of complexity away from the users.
If you are managing Kubernetes yourself, you need to take care of many things like, backing up and encrypting the etcd store, setting up networking among various nodes in the clusters, patching your nodes periodically with the latest versions of OS, managing cluster upgrades to align with the upstream Kubernetes releases. This is only recommended if you can afford to have a dedicated team that does just this.
Site Reliability Engineering (SRE)
When you maintain a complex infrastructure, it is very important to have the right observability stack in place so that you can find out errors even before they are noticed by your users, as well as to predict possible changes, identify anomalies and have the ability to drill down deep into where the issue exactly is.
Now, this would require you to have agents that expose metrics as specific to the tool or application to be collected for analysis (which can either follow the push or pull mechanism). And if you are using service mesh with sidecars, they often do come with metrics without doing any custom instrumentation by yourself.
In any such scenarios, a tool like Prometheus can act as the time series database to collect all the metrics for you along with something like OpenTelemetry to expose metrics from the application and the various tools using built-in exporters. A tool like Alertmanager can send notifications and alerts to multiple channels, while Grafana will provide the dashboard to visualize everything in one place, giving users complete visibility on the infrastructure as a whole.
In summary, this is what the observability stack involving Prometheus would look like:
Having complex systems like these also require the use of log aggregation systems so that all the logs can be streamed into a single place for easier debugging. This is where people tend to use the ELK or EFK stack with Logstash or FluentD doing the log aggregation and filtering for you based on your constraints. But there are new players in this space, like Loki and Promtail.
This is how log aggregation systems like FluentD simplify our architecture:
But what about tracing your request spanning across multiple microservices and tools? This is where distributed tracing also becomes very important especially considering the complexity that microservices come with. Tools like Zipkin and Jaeger have been pioneers in the area, with the recent entrant to this space being Tempo.
While log aggregation would give information from various sources, it does not necessarily give the context of the request and this is where doing tracing really helps. But do remember, adding tracing to your stack adds a significant overhead to your requests since the contexts have to be propagated between services along with the requests.
This is how a typical distributed tracing architecture looks like:
But site reliability does not end with just monitoring, visualization and alerting. You have to be ready to handle any failures in any part of the system with regular backups and failovers in place so that either there is no data loss or the extent of data loss is minimized. This is where tools like Velero play a major role.
Velero helps you to maintain periodic backups of various components in your cluster including your workloads, storage and more by leveraging the same Kubernetes constructs you use. This is how Velero’s architecture looks like:
As you notice, there is a backup controller that periodically makes backups of the objects, pushing them to a specific destination with the frequency based on the schedule you have set. This can be used for failovers and migrations since almost all objects are backed up.
There are a lot of different storage provisioners and filesystems available, which can vary a lot between cloud providers. This calls for a standard like Container Storage Interfact (CSI) which helps push most of the volume plugins out of the tree thereby making it easy to maintain and evolve without the core being the bottleneck.
This is what the CSI architecture typically looks like supporting various volume plugins:
What about clustering, scaling and various other problems that comes with distributed storage?
This is where file systems like Ceph has already proved themselves, though considering that Ceph was not built with Kubernetes in mind and is very hard to deploy and manage, this is where a project like Rook could also help.
While Rook is not coupled to Ceph, and supports other filesystems like EdgeFS, NFS, etc. as well, Rook with Ceph CSI is like a match made in heaven. This is how the architecture of Rook with Ceph looks like:
As you can see, Rook takes up the responsibility of installing, configuring and managing Ceph in the Kubernetes cluster. The storage is distributed underneath automatically as per the user preferences. All this happens without the app being exposed to any complexity.
A registry provides you a user interface where you can manage various user accounts, push/pull images, manage quotas, get notified on events with webhooks, do vulnerability scanning, sign the pushed images, and also handle operations like mirroring or replication of images across multiple image registries.
If you using a cloud provider, there is a high chance that they already provide image registry as a service already (eg. GCR, ECR, ACR, etc.) which removes a lot of the complexity. If your cloud provider does not provide one, you can also go for third party registries like Docker Hub, Quay, etc.
But what if you want to host your own registry?
This may be needed if you either want to deploy your registry on-premises, want to have more control over the registry itself, or want to reduce costs associated with operations like vulnerability scanning.
If this is the case, then going for a private image registry like Harbor might actually help. This is what the architecture of Harbor looks like:
Harbor is an OCI compliant registry made of various open source components, including Docker registry V2, Harbor UI, Clair, and Notary.
Kubernetes acts as a great platform for hosting all your workloads at any scale, but this also calls for a standard way of deploying the applications with a streamlined continuous integration/continuous delivery (CI/CD) workflow. This is where setting up a pipeline like this can really help.
Some third-party services like Travis CI, Circle CI, Gitlab CI or Github Actions include their own CI runners. You just define the steps in the pipeline you are looking to build. This would typically involve: building the image, scanning the image for possible vulnerabilities, running the tests and pushing it to the registry and in some cases provisioning a preview environment for approvals.
Now, while the steps would typically remain the same if you are managing your own CI runners, you would need to configure them to be set up either within or outside your clusters with appropriate permissions to push the assets to the registry.
We have gone over the architecture of the Kubernetes-based cloud native infrastructure. As we have seen above, various tools address different problems with infrastructure. They are like Lego blocks, each focusing on a specific problem at hand, abstracting away a lot of complexity for you.
This allows users to leverage Kubernetes in an incremental fashion rather than getting on board all at once, using just the tools you need from the entire stack depending on your use case.
If you have any questions or are looking for help or consultancy, feel free to reach out to me @techahoy or via LinkedIn.