Choosing the Right Container Native Storage for AWS Public Cloud
With more independent software vendor (ISV) applications being packaged to be deployed via Kubernetes to allow for the simple shifting between execution venues, stateful workloads are becoming a standard pattern within these clusters. Kubernetes is on its way to becoming the industry standard for managing cloud native data applications, with the Data on Kubernetes community growing at a rapid pace.
In a new report from Architecting IT on container-native storage performance in the public cloud, Chris Evans takes a closer look at the different storage options available on one of the popular hyperscaler choices, Amazon Web Services (AWS), and the performance of the different storage options available.
The question posed: As we start to see new data-centric workloads as Kubernetes native solutions, which storage makes the most sense to use, and how do we select the most cost-effective solution? For example, running many hosted databases using a service like RDS — while simple and convenient, the costs can add up if you use tens or hundreds of these individual instances.
With new offerings from companies like EnterpriseDB and its cloud native Postgres (CNPG) operator, could you consolidate these virtual-machine-based services into a Kubernetes cluster to optimize your cloud spend? For example, architectures; you can see talks, such as a recent one at KubeCon about CNPG on EKS using NVMe-based storage and Ondat CNS here.
We also wanted to look at how to optimize the cost of your cloud storage spend for workloads that require extreme levels of resilience (regional replication) and performance (such as newer artificial intelligence or machine learning workloads).
Most application teams looking to run an application in Kubernetes want to name a storage class for their application to consume in their Kubernetes cluster, and off they go. The cluster operations team needs to make sure this storage class delivers dynamic storage via a container storage interface (CSI) plugin and make sure they understand the nonfunctionals this solution delivers. This creates a really clean and well-defined interface, and by using multiple storage classes, different capabilities can be exposed.
In the report you will see AWS native storage classes as well as container native storage (CNS) classes. The capabilities of each are explained.
For example, does the CSI provide storage that is resilient, and where does this resilience boundary end? For example, EBS storage delivers incredible resilience but does not span availability zones (AZ). This, of course, opens up some very interesting patterns where you can combine different storage technologies such as EBS storage with Ondat CSI plugins to provide availability zone resilience from EBS, and region-level replication and resilience with Ondat.
Based on the other recent report from Architecting IT where it conducted a CNS benchmark of the most popular storage solutions for Kubernetes, this combination seems a good choice with Ondat recognized as the fastest CNS available today.
Jumping to an overview of the report, the bulk of the analysis has been done using the industry standard FIO tool to simulate the two ends of the I/O workload scale. With the IOPS test cases, small blocks with 100% random workload are used to simulate things like artificial intelligence/machine learning workloads or PostgreSQL/MySQL databases.
The latency tests use the same random workload with a smaller queue depth to make sure we are getting the true latency of the storage, and last, the throughput tests, which simulate streaming data workloads are done using small queue depths but very large block sizes with sequential access to the data. Using the testing results, and coupling it with the capabilities of the backend data store for things like resilience and durability, platform architects should be able to create a storage class for every application need.
To summarize the report: Using the raw building blocks in AWS and machine types like I3en with local NVMe drives, much faster and lower latency solutions with availability zone resilience can be created using CNS solutions like Ondat to deliver incredibly low cost per IOPS solutions with regional-level resilience. EBS and EFS deliver incredible resilience and very cost-effective storage solutions measuring the cost per gigabyte, and EBS can also be complemented with CNS solutions like Ondat to provide availability-zone failover, giving regional resilience as well for production workloads.
It is also worth noting that the report highlights how multidimensional the performance considerations can be beyond our initial look at just the storage numbers with FIO. The very spiky write speeds when we start using inter-availability zone replication (notice the strange periodic 48-second falloff in performance, for example, which looks like a quality of service or throttling artifact) for the NVMe drives indicate we should also be thinking about and testing network latency between availability zones.
It would be interesting to repeat the testing and also measure things like actual network performance of the instances (i3en.xlarge claim “up to 25Gbps”) using tools like iperf and observing how intra (noisy neighbor effect) and inter-availability zone network traffic affects performance.
I recommend that all cluster architects with AWS-hosted Kubernetes clusters in their fleet read the full report as it helps to explain the capabilities of the underlying storage in a quantitative way. By reading the report, application developers responsible for applications running in these Kubernetes clusters should be able to understand and make better decisions about the correct storage to use.
If you would like to try Ondat, there is a fully featured community edition (up to 1TiB of storage) that you can use for free today.