Platform9 Raises the ‘High Availability’ Bar for OpenStack
When a cloud service provider promises “five nines” of uptime per year (i.e. 99.999 percent), it means a customer shouldn’t expect more than five-and-a-half minutes of downtime over a 12-month period.
It’s something service providers currently offer their customers for hosting virtual machines on VMware infrastructure. Over the past few years, OpenStack contributors have come to realize that if they were to be taken seriously in any consideration of service quality, they needed to be able to provide these “five nines.” Last month, service provider Platform9 announced that it’s confident enough to offer what it calls high availability (HA) service for its OpenStack customers using KVM hypervisor-driven infrastructure.
In a recent company blog post, Platform9 Product Vice President Madhura Maskasky explained its strategy for delivering HA: spreading out service over multiple availability zones (AZs).
“Traditional applications assume always-available infrastructure,” wrote Maskasky. “The application architecture cannot tolerate hardware failures. . . With Platform9-managed OpenStack, such applications can be protected as well. If the application VM crashes or the underlying hypervisor node goes down, such VMs can be relocated to a new host.”
We’ve heard the phrase “highly available,” or “high availability” (in lower case), applied to Platform9 in the past. Some of that phraseology has been associated with SolidFire, an all-flash storage array company acquired by NetApp last December. Platform9 has been offering customers a means of integrating their OpenStack Cinder-based block storage with SolidFire arrays since August 2015.
This new round of HA suggests something a bit more specific. While OpenStack distributor Mirantis has been offering what it calls “highly available OpenStack” since 2012, Maskasky told The New Stack that Platform9 customers should indeed expect more from this latest round.
“Virtual Machines running on the KVM hypervisor, and managed by OpenStack, can now be run with high availability,” she wrote to us. “‘Highly available OpenStack’ refers to the availability of the OpenStack controller services, and not to the virtual machines being managed by OpenStack.”
HA, as Platform9 portrays it, refers to the services that OpenStack makes available through the virtual machines it hosts, as opposed to the availability of the underlying components of the OpenStack platform. It’s the kind of distinction that makes us want — or perhaps, need — to re-examine the way high availability and highly-available-ness have been portrayed.
This April 2014 blog post from Red Hat senior technical product manager Arthur Berezin equated highly available OpenStack controllers with High Availability. In it, Berezin explained how using a Red Hat component called Pacemaker to establish clusters and to deploy HAProxy load balancers among those clusters, resulted in highly available controllers that addressed High Availability needs.
At a presentation to OpenStack Israel 2015, Berezin explained Pacemaker as a cluster resource manager, provided as part of the operating system, that wraps itself around services and controls their current state of operation. Together with a virtual IP address and an HAProxy load balancer, Pacemaker can ensure that some copy of a service, somewhere in the network, responds to an API request.
But in Madhura Maskasky’s Platform9 blog post, she raised the bar somewhat, suggesting that true HA is defined not by the developer or the admin but by the customer. When a customer assumes a service is highly available, that service is probably the application, not the controller node. So HA, as she perceives it, means redundancy of the virtual machines being hosted.
Oh Yea, Storage, That’s Important Too
“To recover a VM on a failed node, shared storage is needed,” wrote Maskasky in her company blog post. “All nodes in a HA cluster should use the same shared storage with identical VM storage paths to properly recover VMs on a failed node. Upon enabling HA on an AZ in OpenStack, Platform9 deploys distributed clustering services on all KVM nodes in that AZ. The clustering services use [a] gossip protocol to keep track of all nodes in the cluster.”
There are actually a handful of various gossip protocols currently in use today. One of the first to make serious headway was SWIM [PDF], which was supposed to have been an acronym for “Scalable Weakly-consistent Infection-style Process Group Membership Protocol.”
SWIM was developed at Cornell University, with funding administered by NASA’s Jet Propulsion Laboratory. The whole point of SWIM was to substitute typical, semi-reliable heartbeat protocols with an algorithm that periodically measured variances in the chatter messages, or gossip, between servers in a cluster about the membership of their various processes.
If you’re familiar with the way, in the “real world,” that political pollsters detect possible vulnerabilities in the reliability of voting districts, based on the registration patterns of voters in those districts, then you’re not unfamiliar with the basic principles of SWIM. Here, processes in a cluster of servers register for membership. The registration process generates streams of messages, which set forth a pattern of normal operation. When a pattern changes beyond a continually updated tolerance level, a failure detector component kicks in to determine the relative probability of a real failure.
The “infection” part is based on a metaphor describing how the membership signals are disseminated throughout a cluster. If a “suspicion of failure” message to be multicast to multiple other IP hosts in a cluster, there’s a higher chance that the multicast message would not be received by an intended recipient, than for the suspicion to be actually correct. That’s because IP multicast was intentionally designed to be a “best effort” approach at targeting a broad swath of addresses. “Infection” relies upon one member node propensity to chatter with another node. Using a gossip protocol, the suspicions are piggybacked onto the member nodes’ regular membership chatter messages.
Later, HashiCorp built onto SWIM protocol with its own, productized version called Serf (Not “surf,” mind you, and not an acronym). According to HashiCorp’s documentation, Serf utilizes a dedicated gossip layer, rather than piggybacking, that enables a higher rate of communication with a slower, and easier to manage, rate of failure polling.
High Highly Available Ability
In a demo of HA service, Platform9’s Cody Hill demonstrated how customers can switch on high availability for both cloud-native and legacy, VM-hosted applications — HA from the customer’s perspective. Hill showed how an AZ represents a failure zone within or between data centers, and how a customer can designate the specific host addresses for each zone. Scaling an app across these zones, and specifying the AZ addresses in the configuration, enables the app to be scaled across zones for true HA.
Platform9 can then automatically compose scalability policies across the AZs, based on alerts it receives. Hill’s cloud-native demo simulated the failure of a compute node on an OpenStack host where the results of a database query are updated every five seconds. Even with the node failure, the user of the app continued to see a continually updated clock, by virtue of a properly orchestrated failover.
In the non-scaling VM portion of the demo, Hill spun up a VM in one AZ. One instance of the VM was taken down, and soon, a new instance was spun up with a new IP address. This time, there was noticeable downtime as the new VM was being spun up — an interval which Hill first described as “a little bit.” The recording was condensed, so the time for the new instance to receive a new address consumed about three minutes, by Hill’s watch. Availability of the service on that address then required another half-minute or so.
That’s under the five-and-a-half-minute mark stipulated by the “five nines” rule of downtime. But it means that this same server cannot afford another failure incident for the remainder of the year.
Maskasky’s blog post points to an OpenStack component called Masakari as responsible for rescuing VMs automatically. Masakari was introduced to engineers in Tokyo at OpenStack Summit last October, by Masahito Muroi, an engineer with NTT. The project is currently being open sourced through GitHub.
“High Availability for virtual machines is being developed in a community-centric manner, and will be fully open-sourced,” stated Maskasky in a note to The New Stack. “Platform9 itself packages these capabilities as an out-of-the-box feature in its Managed OpenStack SaaS solution.”