The Scalability Myth
Lots of object storage companies like to toss around the term exabytes and talk about infinite scalability. When they make those claims, however, they are invariably talking about static, archival data.
Realistically speaking, scalability to exabytes and beyond isn’t that hard when you are putting it in cold storage. Yet, this isn’t what modern enterprises want. They want security, scalability and performance — because they want to be able to interact with that data at scale, for AI/ML workloads and advanced analytics platforms like Splunk.
We see it in the data. This poll from The New Stack’s 2020 Next Generation Infrastructure Survey is a great example:
So, what’s required to deliver a truly scalable storage system? To start with, object storage – given that every hyper-scale system is built on top of object storage, this is a bit of a no-brainer. What are the other components, though? Here’s our take:
- System Scalability: To deliver against the scalability requirement, the entire system needs to be scalable. Systems that use Cassandra as the metadata database are examples that don’t scale linearly. Since Cassandra is better at writes than it is at reads, you are limited in what you can do with the data as far as scalability is concerned. Cassandra is also very poor at large-scale actions such as deletes. If the entire system isn’t scalable, then your project fails to deliver on scalability. The entire system needs to scale elegantly, seamlessly and without issue for all kinds of workloads — from artifactory storage and snapshots to machine learning pipelines.
- Performance can be evaluated across multiple dimensions — raw, straight-line performance as well as performance at scale. The difference is simple (running a benchmark for your object store and a few TBs of data may produce some nice numbers), particularly if you set/tune erasure code, quorum consistency, encryption and bitrot protection to low protection levels (or turn them off entirely). The real test, however, is sustaining that performance across multiple PBs for all kinds of access patterns and object sizes. Without that scalable performance, you can only realistically operate on a fraction of your data. The use cases in AI/ML are trending not just toward massive amounts of data, but they also increasingly look at what is called the “dark data.” Dark data holds secrets but is generally forgotten or archived for performance reasons (too big) or cost. Modern object stores need to deliver performance across the continuum of scale. Selecting an object store that can do that ensures the organization can unlock all of the value that lies in that data — not just some fractional component.
- Security is overwhelmingly the top answer among respondents to The New Stack survey, but this shouldn’t be news to anyone. Storing data includes protecting it from loss and unauthorized access. In the case of Ransomware, these two go together and unauthorized access results in loss. In the continuum of bad, a security breach is the worst because once the data is exposed, the problem compounds and you lose control over it. This is why security must scale too. Security can’t have performance overhead that keeps you from running it all the time. Scalable encryption should also protect data everywhere — in flight (TLS certificates) and at rest (KMS, encryption). Access management (authentication and authorization) and object locking should also be part of the security portfolio. They all should scale if you want to deliver comprehensive protection. Taken together, these are monumental requirements that most object stores cannot deliver against. For this reason, enterprises compromise and see predictable results.
- Operational Scale: The ability to manage massive infrastructure with just a handful (or even just a couple to manage across time zones) of people is operational scale. Some call it maintainability. We like the term maintainability, too. We are less keen, though, on the total cost of ownership, because you can’t “value engineer” maintainability. You can either put one person in charge of a multitenant, petascale, object storage as a service instance, or you can’t. If the aforementioned needs a team of six to look after security, network, drive, CPU, resilience, SLAs, downtime, upgrades, etc, then that solution is not truly maintainable. Infrastructure functionality needs to be manageable, transparent and simple without sacrificing control or granularity. OPEX is orders of magnitude higher than CAPEX over time. The ability to scale is a function of software selected. Simple, powerful software wins every time because operational scalability is a software problem, not a people problem.
- Software-Defined: While the appliance vendors will argue this point aggressively, the fact is that software-designed solutions scale better when they’re properly defined. By “properly defined,” we mean they run on any commodity HW, VMs or containers as well as popular operating system distributions — not just a couple of tightly defined boxes from a handful of big-name vendors. When software is released frequently and hardware too refreshes often, it has become nearly impossible to keep this HCL validated. Almost all of the Hardware Compatibility List (HCL) is obsolete to begin with. Yes, AWS controls the hardware in their stack, but there is massive variation on the hardware side. We give them full marks for how they manage that. When you are truly software-defined and can run anywhere, the hardware really does become a commodity. Design a data lifecycle around the data — not the hardware spec. The software handles the heterogeneity between media, models — even brands. You can go get your best price and take advantage of quarter-end blowouts. Design your systems with SSD and HDD and tier across them using ILM. Use the public cloud as cold storage. Kubernetes is the driver of that software-defined scale. Software shouldn’t worry about the underlying infrastructure — be it public cloud or bare metal private cloud. Let Kubernetes abstract the infrastructure and roll out your object storage as software containers. While we have said it before, it bears mentioning again — you can’t containerize an appliance.
Scalability is a multidimensional problem. It doesn’t get the attention it deserves, because very few vendors want to discuss it outside of their specific, narrowly defined success criteria. This is bad for the overall industry, because it ignores the things that really matter — security, performance and maintainability. We invite you to consider a more comprehensive list in the hope that it will result in better questions of your current vendors and better system design going forward.