Storage and the cloud have been tied at the hip since Amazon launched Web Services and S3 over a decade ago. Unfortunately, as the cloud has begun to move away from virtual machines and towards containers, the job of persistent storage has only gotten harder.
The whole point of containers, after all, is that you can throw them away when they break and simply fire up another in its place. Storage, on the other hand, is basically the opposite. When a storage instance breaks, you have to fix it or the data inside will be forever corrupted. Data must remain when containers vanish.
That implies attaching storage to a container but also implies long wait times as file systems are allocated, given proper permissions, and have their data copied over from a long-term storage system. This is the antithesis of on-demand, ephemeral computing.
Enter StorageOS. In a marketplace where the last big crop of new storage companies were all acquired seven years ago, and all of them required you to buy physical boxes, StorageOS has its work cut out for it.
That’s because the company’s primary focus is on container-based persistent storage, filling a void in the market, noted Alex Chircop is co-founder and CTO of StorageOS. He was also in charge of storage platform engineering at Goldman Sachs in a former life. He said that existing products for storage in container environments are mostly halfway solutions.
Chris Brandon, co-founder and CEO of StorageOS, said that their product is a “Container-based storage array that is completely portable, orchestration agnostic, kernel agnostic, runs at the application layer, configures volumes to containers, and offers a policy-based rules engine for placement of what type of storage it gets stored on. We take care of the replications, snapshots, clones, and all the performance associated with that storage.”
“One of the reasons we started to work on StorageOS is that when Chris was trying to build a storage system in a cloud environment, he found it was hard to configure solutions because of a lack of API integrations, but also because it’s tough to move data around with applications,” Said Chircop. “We’ve designed this API integration from day one to work with Docker or Kubernetes.”
And that’s the big difference between StorageOS and other cloud-based file system solutions, according to the company: StorageOS is designed from the ground up to be driven entirely by an API. That may seem odd for a storage system, but the rest of the cloud is configured and handled by APIs, so adding storage to that mix makes it more accessible, and easier to control programmatically.
Chircop said that other solutions do not meet all the needs of cloud-based storage. “Chef and Puppet take time. Existing systems are designed to present storage to OS instances rather than the containers or applications themselves. Rather than just defining or declaring what an application needs in terms of volume and attributes, you also have the extra complexity of having to map the application on half those nodes that have storage attached. We’re running at the application level, presenting storage to the application or container, so we bypass that level of complexity.”
The team has abstracted the storage layer from the lower levels usually associated with hard drives and persistent file stores.
The product is deployed as a container, and within that container, a control plane manages the configuration within a cluster and the data planes that provide the functionality for the data part of your volumes. That control plane manages the health, configuration, cluster status, and schedules changes on those volumes. The data plane has a virtualization layer that allows it to see the different types of storage that are available in each of the nodes. Physical disks in bare metal or virtual disks in VMs.
“We build a global, highly available storage pool out of that, which we then carve volumes out of,” Chircop said. Those volumes are named and can then be attached to running containers.
With so many existing solutions out there, StorageOS has had to focus very specifically on its ideal use cases. As a result, Chircop says it can do things most other cloud-based storage solutions cannot. GlusterFS, for example, is a shared file system, and while it can be globally available to many users at once, but due to consistency concerns, latency can get to be a big problem, said Chircop.
Similarly, NFS is unsuitable for container use, said Chircop. “NFS is a point-to-point protocol, so you have a file system somewhere which is being exported through an NFS server, and the server itself is always going to be a single point of failure. There are a variety of ways to cluster NFS, but they tend to be in commercial solutions or appliances. While NFS is ubiquitous because it’s one of the oldest file-sharing protocols there is, it’s a stateless service. The state gets maintained on the client and things like locking and consistency are an exercise for the application to handle.
“While it’s very common to use NFS for sharing files,” said Chircop, “Real life applications for changing files and posting stuff on NFS tends to have a fair level of complexity, and there are really complex solutions when NFS servers fail because they lose their locking states and there are a lot of corruption opportunities there.
“Typically, because NFS isn’t the most performant protocol,” said Chircop, “There’s always a certain amount of caching. Some of the consistency issues you see in applications arise if things like locking aren’t implemented properly. We do see a use case for NFS, and for shared file systems like Gluster. We do think, however, it gets very complex to expose services like NFS with containers because containers area typically operating within private IP namespaces, so each container will have its own network service definition, and the open file handles that are maintained within NFS are effectively a hash of an IP address and a file handle.”
Naturally, the cloud providers themselves offer the most cloud-specific file hosting solutions. Chircop specifically said, however, that StorageOS offers better replication and global-distribution capabilities than Amazon’s Elastic Block Store (EBS).
“All of the cloud providers have a concept of availability zones. Each availability zone has a unique set of infrastructure and APIs around it, which is supposed to be completely detached from the other availability zones. By nature, Amazon doesn’t offer replication capabilities across availability zones. You have to use third party software to do that,” said Chircop. “Because we can virtualize the replicas across one or more availability zones, we can make sure your volumes are highly available and survive either an individual node failure or availability zone failure.”
That’s not even the biggest issue with EBS, said Chircop. “EBS is more like a traditional disk: it’s getting presented to an individual EC2 instance. What happens when your container needs to move about? That EBS instance has to be detached and reattached to another EBS node and configured. Sometimes these processes can take fairly long: we’ve seen up to an hour,” said Chircop. “The simple concept of moving a container from one node to another becomes much more complicated. Also there’s no global namespace.”
As a former member of Goldman Sachs, however, there is one EBS sin that Chircop said can never be accepted by financial institutions. “Although EBS has a built-in encryption functionality, it requires that AWS controls the encryption key. That is not OK for banks and governments,” said Chircop. StorageOS does not have this requirement for on-disk encryption.
StorageOS is still ramping up for full general availability, but the company expects to be open to all users sometime by the end of the year. its roadmap also includes the addition of shared file systems across multiple containers, a new feature it anticipates will be available later this year as well.
Feature image via PXHere.