Beyond Block and File: COSI Enables Object Storage in Kubernetes
Kubernetes has been invaluable in developing today’s applications. There are other options for container orchestration systems, but Kubernetes is recognized as one of the most popular and mature options. Let’s now look at the power of this system, its limitations and the innovation that has made it even more usable and beneficial.
Kubernetes: A Quick Overview
For those unfamiliar with Kubernetes, it’s an orchestration system used to deploy and manage containerized applications. In a cluster, Kubernetes ensures the running state always converges to specifications provided by application developers.
The system uses abstract “objects” to define an API. The objects are resources that developers can use to run their application (like containers and load-balancers), how many instances are desired, and where in the cluster an application is allowed to run, etc. These abstractions allow developers to focus on their application instead of the environment it will run in. It also encourages application architectures that are very portable, scalable and resilient.
The Limitation of Kubernetes
It can become confusing at this point. It’s typical for Kubernetes workloads to be very dynamic, especially in microservices architectures where many ephemeral, stateless services make up the application. Service containers are deployed in pods and usually spread on a cluster of machines. Pods can be destroyed, created again and replicated — resources are taken wherever they are available in the system at the moment. For example, if a service fails, Kubernetes will start it up again, but not necessarily on the same machine in the cluster.
It’s an understatement to say that for persistent storage, this is not ideal. However, almost every production app includes stateful services — workloads that require their state to be saved and accessed later. To keep data safe and to make sure it is available anywhere in the Kubernetes cluster, production apps use remote storage away from the Kubernetes cluster the app runs in.
Persistent volumes (PV) is the concept by which Kubernetes exposes permanent storage to the user. A PV resource is available to the whole cluster, decoupling it from the pod’s lifecycle. Persistent volumes are most often backed by an attached external storage. Kubernetes uses control plane interfaces to link with external storage and storage vendors have to develop volume plugins that will work with Kubernetes. For some time, these volume plugins had to be developed and released with Kubernetes codebase. They are usually called in-tree volume plugins.
For storage vendors and Kubernetes developers alike, these plugins create issues. If vendors want to offer their solution for Kubernetes, vendor-specific code has to be compiled and shipped as part of Kubernetes. This ties the ability to deliver updated integration code to the Kubernetes release cycle; and at the same time, makes it difficult for the Kubernetes community to test the software they ship. Moreover, from the point of view of Kubernetes users, the choice of storage vendor and type is limited to a few that have their plugins in the repository.
Yes to Block and File Storage – What About Object?
The Kubernetes community introduced the Container Storage Interface (CSI) in 2017 to overcome limitations imposed by in-tree storage plugins. CSI support got GA status in Kubernetes 1.13. CSI is a standard for exposing arbitrary block and file storage systems to containerized workloads running on Kubernetes (or any other container orchestrator using CSI). It makes the Kubernetes volume layer truly extensible. With CSI, third-party storage providers can create and deploy volume plugins to expose their system to Kubernetes without having to touch the Kubernetes codebase.
But there is a third storage option: object. Because there are unprecedented volumes of unstructured data being produced today, object storage is quickly and rightfully growing in popularity. Today, applications in production often choose object storage to persist their data. There are a variety of reasons why, but some of the primary ones are that it’s cost-efficient and extremely scalable, it enables granular access permissions, and it’s easily accessible through network APIs.
COSI Sets the Stage for Object Storage
Until recently, Kubernetes’ persistent storage mechanism dealt exclusively with block and file storage. However, given the popularity of object storage, the Kubernetes community went on to define a new standard: the Container Object Storage Interface (COSI). This new standard comprises a set of abstractions for the provisioning and management of object storage that aims to be a common layer across multiple vendors. The design is modeled after CSI and supported by developers of various open source and commercial storage systems.
These are the two control-plane APIs that COSI defines:
- API that vendors use to implement their storage system plugins to act as a COSI backend.
- User-facing APIs for bucket provisioning and access.
This technology is vendor-neutral because the COSI standard does not implement any specific object storage protocol or API.
COSI’s Bucket APIs
Object storage can’t use the block and file storage primitives used in CSI. The unit of provisioning in object storage is a bucket (not a volume) and buckets are not mounted. Rather, they are accessed over the network. Moreover, object storage allows for more granular access control.
COSI introduces a set of new resources to work with buckets to implement object storage abstraction:
- BucketClass: A cluster-scoped resource containing fields defining the provisioner and a parameter set for configuring new buckets.
- BucketRequest: A namespaced resource representing a request for a new backend bucket or access to an existing bucket.
- Bucket: A cluster-scoped resource referenced by a BucketRequest and containing connection information and metadata for a backend bucket.
- BucketAccessClass: A cluster-scoped resource containing fields to specify policies that may be used to access buckets.
- BucketAccessRequest: A namespaced resource representing a request for access to an existing bucket.
- BucketAccess: A cluster-scoped resource for granting bucket access.
In addition, COSI defines a service called “provisioner” (see Figure 1). The provisioner must be implemented by the vendor in order to communicate with COSI and offer a storage solution compatible with the standard. You can find the gRPC specification on the COSI GitHub repository.
COSI Saves the Day
CSI overcame an initial container storage hurdle, but it’s not enough. While the existing primitives in CSI do not apply to object storage, COSI provides a common layer of abstraction for provisioning and managing the lifecycle of object storage buckets. With COSI, Kubernetes cluster users are able to manage object storage in a standardized, native way. Storage vendors are able to expose their object storage solution through COSI without ever touching Kubernetes code. Storage is able to meet current needs, and everyone wins.
To learn more about Kubernetes and other cloud native technologies, consider coming to KubeCon+CloudNativeCon Europe 2021 – Virtual, May 4-7.