At Giant Swarm, we manage Kubernetes clusters for customers 24/7, both on-premises and in the cloud. That means we do not just set something up and hand it over, but we actually take care that it’s operational and up-to-date at all times.
The goal is to give end-users a carefree Kubernetes-as-a-Service (KaaS) no matter the underlying infrastructure. Most importantly that means getting the full vanilla Kubernetes experience out-of-the-box, anything that is possible in a standard Kubernetes setup should be available, including Ingress, NetworkPolicy API, roll-based access control (RBAC), alpha features, or even privileged access (if not disallowed by your security department).
Soft vs. Hard Multitenancy
For most companies, this also means they need multitenancy as they have different stages, teams, projects that should all get Kubernetes.
While we encourage the use of Namespaces, NetworkPolicies, and RBAC and help with the integration, we know that “People must come to things in their own time, in their own way, for their own reasons, or they never truly come at all,” as Dee Hock once wrote.
Furthermore, a lot of the soft multitenancy functionality of Kubernetes is still under active development and trust (especially for concepts like NetworkPolicies, which are new to most enterprise security) will only come with time.
Especially in the enterprise context, soft multitenancy is often not enough. There are different service classifications each with their own requirements, most extreme when there’s PCI-compliant and non-PCI services or privacy-relevant data. Environments need to be separated as e.g. production clusters should not share anything with testing clusters.
Even without those hard requirements, teams often just want to try things out on a fresh cluster or test run their services on new or upcoming Kubernetes versions.
Bottom-line, most of the time we run multiple clusters at our customer data centers (and in their cloud accounts) just like we do in our own data center.
We knew from the beginning that, if we provide others with Kubernetes, we should and actually would like to use the same concepts within our product as well.
That means we want to run our everything, including the infrastructure components that set up the user clusters, the microservices that make up our API, the outer monitoring, but also the actual end-user-facing Kubernetes clusters inside Kubernetes. And with that we mean not only self-hosted, in the sense that they are managed by a kubelet, but rather inside a full-blown Kubernetes cluster.
To make talking about the different parts less confusing we call the outer “host” Kubernetes “Giantnetes” (G8s) and the end-user-facing “guest” clusters Kubernetes (K8s).
In its basic form, G8s is a Kubernetes cluster, that is running on Amazon Web Services and bare-metal machines. The focus in this overview is a bare-metal deployment.
Next to the G8s masters, the control plane contains several groups of deployments, including infrastructure services (Vault, etcd,…), monitoring, and the Giant Swarm API. The latter consists of a group of microservices that cluster admins can talk to create, manage, and destroy Kubernetes clusters for their users. Lastly, the control plane also contains the master nodes of the guest Kubernetes clusters. To access the Kubernetes clusters each K8s API gets its own Ingress rule inside G8s.
The workers of these Kubernetes clusters reside on nodes in what we call the application zone. The application zone can be split into multiple zones if workloads need to be separated onto different physical hosts. For example, a cluster can be split into a three-tier (e.g. presentation, application, data) or multitier architecture if this is a desired separation.
For isolating the Kubernetes clusters, we start both the K8s masters and their worker nodes in VMs. On bare-metal, we are using QEMU in a Docker container to start our CoreOS Container Linux in KVM. VMs are parametrized via Cloud-Config to either create a master or a worker node and configure the Kubernetes services to form a cluster.
Etcd currently runs inside the VMs and stores its data on a persistent volume, but we are working on externalizing the etcd and using the etcd Operator for managing the etcd clusters.
Giantnetes uses Calico to enable network policies on the outer layer. On top of that, as we want the Kubernetes VMs to live in their own isolated network, we give each Kubernetes cluster its own Flannel Virtual Network Interface (VNI) using the VXLAN backend. For this, we run a Flannel server per host and a Flannel client per cluster on each host that harbors nodes of that cluster. Inside the respective Flannel networks, we again run Calico Border Gateway Protocol (BGP), so the users can use the NetworkPolicy API to control their network traffic.
Certificates and Access Control
A proper production Kubernetes cluster should be secured with TLS, and not only the API server but every component that makes up the Kubernetes. For this, each cluster gets its own root CA using a PKI backend in Vault. Then each component of the cluster (e.g. api-server, kubelet, etcd,…) gets its own role and certificates issued from that role. The certificates get handed into the VMs via mounts.
To have a way to securely access the cluster, we create roles and issue certificates from the same PKI backend for users to use in their Kubernetes clients. As each certificate belongs to a specific user, a Kubernetes admin can then bind those users to RBAC roles.
From Deployments to Operators
In our first iteration that we described a few months ago, we were starting Kubernetes clusters using basic Kubernetes concepts like Deployments, InitContainers, ConfigMaps, etc. Since then, we’ve started working on writing custom controllers that manage these resources based on third-party resources (TPRs), which are a way to easily extend the Kubernetes API.
CoreOS fittingly calls this concept Operators, as they represent codified operational knowledge about specific services that you want to manage.
For now, we have a KVM operator, which creates K8s clusters on bare-metal via KVM, and an AWS operator, which creates similar K8s clusters on AWS.
Both act on a cluster third-party object (TPO), which defines the specificities of a Kubernetes cluster. Just like for example a deployment resource the cluster TPO can be used to declaratively tell Giantnetes to deploy or change a Kubernetes cluster by writing a simple YAML file. The configuration options include the Kubernetes configuration/version; Ingress Controller configuration/version; Flannel configuration; Certificate attributes; and Master and Worker configurations.
Future of Operators
As we are working on these Operators we realized two issues that we already started working on:
- Operators should not be too big — we need micro-operators!
- Operators have a lot of boilerplate — we need an Operatorkit!
Operators just like Linux tools should have single responsibilities. Just like Microservices, Operators should be kept simple and maintainable.
A Kubernetes cluster is a rather complex resource. Thus, it should not be managed by a single Operator, but rather by a group of interacting micro-operators. Some examples of micro-operators in the Kubernetes cluster TPR context are a certificate operator and the etcd Operator, so a cluster operator could split up the cluster TPO it gets and create TPOs for the other micro-operators to get the job done.
We build our microservices internally based on microkit, which is an opinionated microservice framework that can be used as a library in golang microservices to speed up development. As we experienced building our first Operators, just like with the microservices above there is a lot of boilerplate involved. Our goal with Operatorkit will be to reduce the boilerplate in our Operators and collect it in that library.
We believe Kubernetes is perfectly suited to manage complex deployments like Kubernetes itself, not only by providing flexible primitives to deploy and manage these but also because of its extendability.
Third-party resources are a great way to extend the Kubernetes API with resources that you want to automate in your clusters.
Custom controllers can then act upon these TPRs to manage the actual resources using the underlying primitives Kubernetes provides with the added operational knowledge codified in them.
We are happy to see that the operator concept is gaining traction quickly in the Kubernetes community. There are already lists of the many (mostly open source) Operators out there and they are growing fast.
By open-sourcing our Operators and releasing Operatorkit hopefully soon, we hope we can give back to the community and enable more people out there to write Operators.
For more on Kubernetes networking and related topics, come to CloudNativeCon +KubeCon Europe 2017 in Berlin, Germany March 29-30.