Programming Languages

ZStack, an Infrastructure Software with an In-Process Microservices Architecture

1 May 2015 11:36am, by

Editor’s Note: Developing cloud services can be notoriously complex. Frank Zhang is seeking to simplify the development of IaaS environments with ZStack, an open source project that is designed on the principles of simplicity, scale and extensibility.

Have an open-source project and want to explain it? Send it to us at The New Stack and we’ll consider it for publication.

ZStack is open source IaaS (infrastructure as a service) software designed to automate datacenters and manage resources of compute, storage and networking through APIs. It is designed to tackle obstacles which prevent enterprises from adopting a private cloud in architectural design. Simplicity, stability, scalability, and extensibility are the main concerns in ZStack’s architectural design, which were taken into consideration from the very beginning.

ZStack uses a so-called in-process microservices architecture to encompass all services in a single process. Deploying a ZStack management node is actually deploying a standard Java WAR file into a web servlet container, which is a well-known technology following the Java specification. ZStack has only a few external dependencies for Linux operating systems,  including MySQL, RabbitMQ and Ansible, all of which are distributed by every Linux vendor. Users can set up a single node management node as:


For a production environment requiring high availability (HA) and scale-out, users can move MySQL and RabbitMQ to separate machines and extend the single management to multiple management nodes:


As a datacenter usually manages massive external devices (e.g., physical servers, storage), agents usually need to be installed on the devices. Instead of requiring users to do it manually, which can be boring and daunting, ZStack integrates Ansible to automate the process of deploying agents fully. For example, when users add a KVM host into ZStack, a KVM agent needs to be installed on the physical server along with dependent packages, including qemu-kvm, libvirt, qemu-utils, iptables and so on; the AddKvmHost API will trigger ZStack to call Ansible to accomplish the task automatically. A snip of the KVM’s Ansible YAML file looks like:


Advanced users can extend the YAML file to add their packages to carry out maintenance tasks; for example, applying a critical security fix to all KVM hosts. YAML configurations and agents are all stored inside the Java WAR file, and ZStack finds them through the Java classpath. The process is transparent to users, who may not even notice the existence of agents.

ZStack monitors the health of resources (virtual resources and physical resources). Administrators can watch status changes through a web UI, e.g., watching host connection status, virtual machine states, physical storage connectivity. Once the status has changed on a resource, say for example an unexpected stop of a virtual machine, ZStack can detect it in a configurable interval and synchronize the real status into the database. ZStack also exposes the internal status of management nodes through a Java JMX protocol. Administrators can watch statistics of various events, including current running tasks, in-processing messages, the maximum processing time of messages, the average processing time of messages, accumulated tasks and so forth, determining whether they need to add extra management nodes to split the system workload.

The comprehensive query APIs are another innovation ZStack introduces to help manage massive resources. Users can perform SQL-like queries through APIs without directly touching the underlying database. ZStack provides more than four million single query conditions and countless combined query conditions. As long as the database tables have foreign keys linking each other, users can execute a query spanning multiple tables; for example, querying a zone that contains a VM with an EIP ZStack parses query APIs,  generating both single table queries and multi-table joins automatically. Users can query every resource by any field without looking up the user manual; ZStack’s command line tool (CLI) provides the auto-completion to remind you of queryable fields and joinable resources.


In the future, ZStack will provide an enterprise quality UI similar to Outlook and JIRA, which allows users to create various views based on the query APIs — for example, a view showing all virtual machines on the same L3 network with state running.


As integration software, IaaS usually needs to manage complex subsystems, including compute, storage and networking. The execution path of a task is typically long, and errors can happen in any subsystem. Because existing IaaS software lacks a mechanism to rollback applied operations on errors, the subsystems are usually left with intermediate states that can lead to the failure of future tasks. For example, if a virtual machine fails to start on a hypervisor, its networking information is likely left over in a networking node. By equipping a workflow engine, ZStack can rollback all previously applied operations when errors happen.


The above picture shows workflows of creating a user VM. If an error happens, say the VmCreateOnHypervisorFlow fails, the workflow engine will rollback all six previous flows including their subflows, reverting operations by returning computing capacities, destroying the virtual networking node, deleting disk volumes and releasing IP addresses.

Besides rollback, the workflow engine also provides a way to configure executing paths of critical tasks. In the above example, the process of creating a virtual networking node is almost the same as creating a user virtual machine, so ZStack reuses most of the flows of creating user virtual machines, except replacing the flow of allocating virtual nics with the flow ApplianceVMAllocateNicFlow (marked in green), by editing the XML file of the flow configuration.


The versatile plugin system, which is similar to Eclipse and Java OSGI, is another architecture design differentiating ZStack from other current IaaS software. All ZStack components are built as small plugins, which guarantees adding or removing features will not impact the stability of the software. The backbone of the plugin system is extension points, which allows components to hook into other components’ executing paths. Every component can define their own extension points by exposing the hooking interface to others.


The above picture is an overview of the security group implementation, which needs to hook into virtual machines’ lifecycles to program firewall rules in hypervisors. For a virtual machine, the firewall rules are an add-on feature that should not be implemented in its own business logic, so the security group is implemented in a standalone plugin, hooking into various extension points provided by the management node service, the virtual machine service, the query service and the network service. Even if users remove this plugin by deleting its JAR file and configuration files, it won’t impact the whole system, but will lead to the loss of the security group.

Tags, also known as labels, are very common in software. Besides the normal use of tags which help users group resources, ZStack defines so-called system tags that allow plugins to add additional information to resources without database changes. For example, the database table of virtual machines doesn’t have a column called hostname; a plugin defines a system tag ‘hostname::{hostname}’ allowing users to create virtual machines with a hostname and change the hostname whenever they want. System tags are essentially key-value pairs in a separate database table; plugins can define any properties for resources without hurting their database schemas, reducing the risk of database migration when upgrading ZStack.

The purpose of the configurable workflow engine, the plugin system and system tags is to decouple the entire architecture loosely, allowing developers to quickly add new features while keeping the software stable, avoiding the instability caused by frequently refactoring existing codes.

ZStack is developed in a test driven development (TDD) manner. There are three rigorous testing systems: the integration testing system, the system testing system and the model-based testing system guarding every feature. The integration testing system uses simulators to validate the business logic of management nodes; the system testing system tests scenarios in real hardware environments. The model-based testing system, which is the most exciting part, can generate test cases by arranging APIs using different algorithms (e.g., random, weight-driven, history-driven), testing corner cases that rarely happen in normal use. Because the model-based testing system may run several days invoking thousands of APIs, it’s very hard to debug when a defect is found. In order to avoid having to manually reproduce the defect, ZStack offers a tool that can read logs to generate a test case, replaying testing procedures, helping developers rebuild the failure environment.



ZStack is the only IaaS software that claims to manage hundreds of thousands of physical servers and millions of virtual machines, and serve tens of thousands of concurrent APIs by a single management node. With simulators, ZStack passed the stress test of managing one hundred of thousands of hosts and creating one million virtual machines by 10,000 and 30,000 concurrent API requests. Creating virtual machines is extremely fast in ZStack — here is the performance data:

Virtual Machine Number Time Cost
1 0.51 second
10 1.55 seconds
100 11.33 seconds
1000 103 seconds
10000 23 minutes

The test was carried out with real virtual machines and simulators using only one virtual networking node. In the testing, the performance bottleneck was observed on the DHCP/DNS software — Dnsmasq. With a patched Dnsmasq using inotify, the time of creating 10,000 virtual machines can be improved to even 11 minutes!

The secret of ZStack’s high scalability consists of three architectural designs: the asynchronous architecture, the stateless-services architecture and the lock-free architecture.

The asynchronous architecture ensures tasks are executed in an asynchronous manner from APIs to agents on devices, which guarantees no threads are blocked waiting for the completion of tasks, so a thread pool with 1,000 threads can easily handle 10,000 concurrent APIs. The asynchronous architecture is made up of three parts: asynchronous messages, asynchronous method calls and asynchronous HTTP calls. The asynchronous messages are used by microservices for intercommunication; the asynchronous method calls are used by components insides microservices;  and the asynchronous HTTP calls are used for communications amid microservices and agents on external devices. An overview of their interoperation looks like:


The stateless-services architecture uses a consistent hashing ring to distribute messages to different service instances in the multi-node environment. It’s called stateless because the service instances don’t need to exchange information about resources they manage to each other, and the services sending messages don’t need to know who is going to serve those messages; the states are decoupled from services into the consistent hashing ring.


With the stateless-services architecture, a multi-node environment containing 100 nodes is as stable as an environment containing two nodes. Adding or removing nodes will not impact the system’s stability.

Because of the stateless-services architecture, ZStack doesn’t use any locks to control resource competitions, but synchronizes operations with queues, as operations to the same resource will always be routed to the same service instance. ZStack’s queues are FIFO queues with parallelism levels from one to N.  Resources that only execute operations synchronously use queues with a parallelism level equal to one:


Resources that can execute operations in parallel use queues with parallelism level N:


Parallelism levels of critical resources can be configured by global settings through UI. For example, users can configure KVM hosts to execute, at most, 10 operations simultaneously.


In the chapter of scalability, we have introduced the workflow engine, the plugin system and system tags that keep ZStack orchestration stable despite adding new features. They are also keys to the extensibility. ZStack’s orchestration services only provide core APIs managing compute, storage and networking. Developers can create various plugins to provide advanced APIs to fulfill users’ scenarios. For example, ZStack’s snapshot APIs only allow users to create, restore and backup snapshots; there are no APIs to schedule tasks periodically, creating and backing up snapshots. Such a feature can be implemented as a plugin that provides cron-job-like APIs and invokes snapshot APIs to conduct activities of creating and backing up snapshots.

Besides in-process plugins, ZStack also offers a way to create out-of-process plugins for features not tightly bound to core orchestration services. For example, the billing system can be built totally out of process by listening to canonical events exposed on the message bus. The ZStack web UI is a typical example of an out-of-process plugin. Plugins needing to access the database and internal data structures of orchestration services can be built as a semi-out-of-process plugin that has a small in-process component sending data to an out-of-process component through the message bus:


Author Frank Zhang joined Intel OTC (Open Source Technology Center) in 2006 and worked on XEN hypervisor, contributing multiple features like the XEN E100 networking card emulator, Windows support for XEN/IA64 guest BIOS, and many bug fixes. In 2010, Frank Zhang joined (acquired by Citrix), working on CloudStack. After quitting Citrix, Frank and his partner founded the new open source IaaS project — ZStack.

Featured image via Flickr Creative Commons.

A digest of the week’s most important stories & analyses.

View / Add Comments

Please stay on topic and be respectful of others. Review our Terms of Use.