As the chief technology officer of a company that specializes in helping customers run stateful services in containers, I’m often asked, “Shouldn’t I just use Amazon EBS (Elastic Block Storage) as a container persistence layer?” My answer, in a word, is no. This post will share four reasons why you are selling your apps’ performance short by stopping at EBS. To be clear, this is not an argument against running on Amazon Web Services. We love AWS! It is about the specific performance penalties you get by using EBS as your persistence layer via a Docker, Kubernetes or Mesos plugin such as the native EBS plugin, Flocker or Rexray. A second part of this article will go into ways to overcome these issues. For now, it’s enough to simply understand them.
The question is a vital one for organizations looking to move into cloud native development and deployment. As enterprises continue to adopt containers, architects and DevOps teams are looking at ways to run stateful apps as part of their container platforms. The results so far are very promising. Enterprises across a spectrum of industries, such as GE Digital, DreamWorks, and Lufthansa are seeing success containerizing their entire applications, not just the stateless parts.
1. Slow Mount Times and Stuck Volumes = Slow Deployments
EBS is a storage area network (SAN), and it, therefore, takes a certain amount of time for a physical block device or drive to be attached to an Amazon EC2 instance. These drives are not nimble, numerous resources as far as the Linux kernel is concerned, and attaching them to hosts is an expensive operation.
In the best-case scenario, spinning up a stateful container on an EC2 instance takes 30 seconds to two minutes. That is because when a container is deployed and needs a volume, that volume must be provisioned on EBS, then attached to the host before the container can start. The attach operation itself is the long pole and simply takes time.
As Tom Jackson, Lead Software Engineer of Nordstrom described in his excellent Container World talk, slow mount-times are just one problem. The other, more serious issue is that EBS volumes can frequently get stuck in an “attaching” state. A cursory Google search for stuck EBS volumes reveals many have this same problem. The terrible consequence of stuck EBS volumes is a host reboot, so now that “lightweight container that spins up in milliseconds” is going to take five minutes or more to come online.
That is not the agility we were promised.
In addition, if an EC2 instance dies with a volume attached to it, similar bad things can happen, and they most often result in a time-consuming host reboot.
2. Slow Failover Means No High Availability
Clearly, a model of one EBS volume per stateful container comes with a startup penalty. What does this penalty mean for failover when using EBS with a volume driver? Let’s look at what happens if we use Kubernetes to failover a stateful container backed onto an EBS volume via an EBS volume driver.
Because we are enforcing a one-to-one relationship between our EBS drives and containers (so we get automatic failover), we trigger the following error-prone sequence of events every time the container moves:
- The EBS volume detaches from the unresponsive EC2 instance.
- The EBS volume attaches to the new node.
- The EBS volume mounts to the new container.
Each of these steps requires a combination of either an API call to the AWS servers and/or running commands as root on a node.
A variety of problems can occur when these events happen:
- The API call to AWS fails (for some spurious reason).
- The EBS drive cannot be unmounted or detached from the old node (for some spurious reason).
- The new node already has too many EBS drives attached.
- The new node has run out of mount points.
In the best cases, we’re looking at failover taking several minutes. In the worst cases, we’re looking at 10 minutes or so of downtime just to move a stateful container. That is not high availability (HA).
In addition, even if we are OK with waiting 10 minutes for a failover, volumes cannot be moved across Availability Zones, making this approach to HA a non-starter for non-trivial applications.
3. Poor I/O, Unless You Want to Spend a Lot
Let’s say you don’t care about mount times, and HA is not a requirement for your app. What about performance? It is well known that in addition to being a single point of failure, SANs cannot provide the I/O and throughput performance of direct-attached storage. Cassandra expert DataStax have a great article outlining the limitations of network attached storage such as EBS for this particular application, but the problem is bigger than just Cassandra. Yes, EBS offers dedicated IOPS options, but these are very costly, you will still suffer the container startup penalty and as a result have to forego HA. It is cheaper to use a storage-optimized EC2-instance server with local storage spaces direct (SSD), rather than paying extra for dedicated IOPS on Amazon’s EBS.
4. Volume Orchestration via a Storage Connector is Fragile
So far, we’ve focused only on the limitations of EBS itself. However, there is another problem related to how a containerized application manages EBS. When we run a containerized application, the scheduler– Kubernetes, Swarm, DCOS – interacts with volume plugin managers or a connector to create/mount/unmount/delete volumes. Examples of connectors include the native EBS driver for Kubernetes and Docker, Flocker, or RexRay from EMC.
These connectors manage the usage and association of containers to volumes. When containers are brought up, schedulers make a request to the volume manager to mount the volume; when containers go away, their requested volume be unmounted. This is not foolproof, mainly because volume managers are typically external processes (or containers) and the interaction between the schedulers and volume managers is loosely coupled. This means that it is possible to “lose” an unmount request that causes undesirable side effects.
For example, because a block device can be only attached to one device at any given point, the container cannot be spun up on any other node. Thi, in turn, implies service/data unavailability.
Unfortunately, this problem cannot be solved with volume managers that deal with EBS. The problem lies in the way storage is exported — via iSCSI, network block device (NBD) or any other mechanism that comes off the network. If an unmount request is lost, the volume plugin manager incorrectly thinks the block device is in use, and the storage provider cannot determine if it actually is in use.
Systems that do not rely on a connector, on the other hand, can always determine if the volume is in use or not because they know if any process has the block device open.
Amazon has done more to change enterprise IT for the better than any other company in the last 20 years; it has created nothing short of a revolution. But when it comes to the persistence layer for Docker containers, EBS leaves a lot to be desired. Clearly, from a startup time, HA, performance and reliability perspective, running Dockerized applications on EBS leaves a lot to be desired. Just as importantly, EBS does not provide important container data services that modern applications require such as cross-AZ (availability zone) failover, cross-cloud migration, offsite backups and container granular encryption. EBS is simply a storage solution, not a data services platform built for DevOps. As you evaluate your next stateful container project, I encourage you to ask, “Do I really want to settle for EBS?”