Cloud Native Storage: A Primer
We recently debated at a technical forum what cloud native storage is, which led me to believe that this topic deserves a deeper discussion and more clarity.
First though, I first want to define what cloud native applications are, as some may think that containerizing an application is enough to make it “cloud-native.” This is misleading and falls short of enabling the true benefits of cloud native applications, which have to do with elastic services and agile development. The following three attributes are the main benefits, without which we’re all missing the point:
- Durability — services must sustain component failures
- Elasticity — services and resources grow or shrink to meet demand
- Continuity — versions are upgraded while the service is running
The cloud-native architecture originating in hyper-scale cloud applications revolves around microservices, i.e. small stateless and decoupled application fragments. Many similar microservice instances can be deployed (using Docker or Kubernetes) to address service elasticity and resiliency. Multiple tiers of microservices are part of a bigger and evolving application.
The 12-Factor methodology specifies that microservice instances must not persist any configuration, logs, or data, enabling cloud-native durability, elasticity and continuous integration attributes. State and data are stored in decoupled scale out log streams, message queues, object storage, key-value and databases.
This methodology is quite different from the one used in traditional monolithic/scale-up enterprise apps and current IT infrastructure (sometimes called IT “pets”), where apps require lots of configuration, logs, data and state (stored per workload in “virtual disks”). Applications with local state or cache may lead to inconsistencies upon failures, an inability to scale out and a difficulty in upgrading without downtime.
Microservices Are Disposable
Microservices can be added, deleted, and scaled instantly — this drives a few requirements from the data:
- All data updates must be atomic and to a durable, decoupled and shared persistence layer. We cannot have a temporary dirty cache or broken transactions; cannot use local logs; cannot perform partial updates to files which may lead to data corruption; cannot maintain local journals in the microservice.
- Data access must be concurrent (asynchronous). Multiple microservices read and update the same data repository at the same time. Updates should be serialized, with no blocking, locking or exclusivity. This allows us to scale application performance linearly as we increase the number of microservice instances and maintain application availability.
- The data layer must be elastic and durable so that we can support constant data growth without disrupting the service. Data is partitioned and replicated across a variable number of nodes in one or more locations to provide resiliency.
- Data should have a flexible structure and schema to allow continued development of new features and application versions.
Cloud provider offerings include services which address the different data models (objects, K/V, tables, documents, messages, etc.). Amazon S3, DynamoDB, Aurora, Kinesis, Google services like Big Query and Spanner, or Azure’s CosmosDB — all meet the above criteria. Open source solutions such as MongoDB, Elasticsearch, Cassandra, RabbitMQ, Kafka and Minio, or commercial multi-model offerings like Iguazio’s data platform make the same assumptions.
Traditional enterprise block storage solutions (SAN, vSAN, HCI, NVMe over fabric, etc.) and many NAS (shared file serving) solutions are not designed to address the above requirements. This is particularly lacking in cases involving atomicity (because uncommitted data and state may exist in the client side) and concurrency (when the same data element can only be updated or is exclusively owned by an individual consumer) and explains why hyper-scale cloud vendors don’t use SAN or NAS with cloud-native applications.
Serverless functions, the latest trend, eliminate local persistent storage and force us to use cloud-native data services, parting ways with traditional approaches.
Keep in mind that supporting data elasticity or having container APIs may not be enough to qualify as cloud-native storage.
Building Cloud-Native Data Services Efficiently
So, we need a bunch of data services to store log streams, statistics, records, documents, files, messages, etc. and there are many open source or commercial software offerings we can use, but shouldn’t we run them over traditional storage?
In most cases, running commercial or open source scale-out data services (object, messaging/streaming, NoSQL/NewSQL, log, …) over traditional or scale-out storage is redundant and inefficient since, those services already use a replication or erasure coding technique at the service level to guarantee durability and in some cases even support global replication and disaster recovery. Don’t spend your money on enterprise storage or hyperconverged storage with RAID or disaster recovery at the storage layer — it will cost more and there’s a good chance it will only degrade your performance and availability due to I/O blending across disks and nodes.
You cannot snapshot a scale-out data service state consistently at the storage layer because it’s distributed and data services maintain versioning and other forms of synchronization internally for that. Furthermore, most data platforms implement compression and deduplication at the record and document level to speed up search performance. Premium storage products with snapshots or deduplication are not required under a cloud-native data service.
Use a traditional block or file storage for a mixed legacy and cloud-native environment which runs on one cluster, just make sure to turn features off for cloud-native data services. To go cloud-native all the way, build a data layer like cloud providers do. Use cheaper, directly attached storage or flash and leverage the resiliency and data management features built into the data services middleware.
It’s more cost-effective to use optimized hardware if you have enough data to store, such that has a higher disk or flash density and higher network throughput, as opposed to a generic server/VM.
Data Services Buy or Rent vs Build
As a developer, the simplest and sometimes cheapest option is to spin a Docker container with your favorite data service (Cassandra, MongoDB, Kafka, etc.), but things get complicated when considering scale and operations. Maintain high-availability, security, configuration management, capacity planning and performance tuning at the data service level for each of those stacks independently. DIY requires more DevOps people with broader skills, and those are not always available or affordable.
Renting or buying infrastructure in public or private clouds is common to reduce risks and hassles. Take it to the next level and consume higher level cloud-native data services, so that you can focus on writing your business applications vs. debugging middleware. Public cloud platforms have tested and integrated data services consumed on demand with hourly based pricing. These platforms have uniform security and management across multiple services, saving us the integration and glue layers.
Public cloud offerings also raise some concerns: they may be higher priced, slower than DIY, include API lock-ins, and unable to just ship all your data to the cloud. Fortunately, a new category of hosted or on-prem, self-service data platforms address those concerns.
You can buy or lease a car (virtual infrastructure), take care of fuel, insurance, parking, maintenance, driving, etc. Or if you are too busy just get a ride-sharing service, pay for what you use and avoid the hassle (cloud services). We seem to be getting busier by the day.