In a typical microservice architecture, multiple instances of each microservice are deployed for availability and scalability. This limits the impact of the failure of a single microservice instance and maintains overall system reliability. Successfully adopting this architecture requires a load balancing mechanism to ensure that incoming requests are spread across all of the available instances, rather than overloading some instances at the same time other instances are under-utilized.
Conventional wisdom suggests a central load balancer is the appropriate mechanism for distributing requests between available microservice instances. This can be done two ways: requests to all microservices can be sent to a central load balancer for balanced distribution between the available instances (possibly with real-time health checks), or each microservice can have its own central load balancer handling only requests for that microservice. Popular choices for load balancing include HAProxy and Amazon’s Elastic Load Balancer (ELB).
Using a single central load balancer for an entire application ecosystem essentially duplicates the standard monolith architecture, but increases the number of instances to be served by one load balancer enormously. In addition to being a single point of failure for the entire system should it go down, this single load balancer can very quickly become a major bottleneck, since all traffic to every microservice has to pass through it, as illustrated in the diagram below showing just a handful of active microservices:
Using a separate central load balancer per microservice type may seem like a logical extension of the standard monolith behavior. In this model, incoming traffic for each type of microservice is sent to a different load balancer, which then routes each request to the next instance in its rotation as follows:
However, closer inspection suggests some tradeoffs with this approach. First of all, adding (or removing) new microservices or additional instances of existing microservices to the system comes with a lot of overhead. Each new component needs to be registered and de-registered as it is added or removed. Further, each new type of microservice needs its own central load balancer, so over time a microservices environment would carry the burden of dozens of load balancers that each need to be managed and maintained.
In addition, while using a central load balancer per microservice type may not add a single point of failure for the entire application, it does add a single point of failure for each microservice type. If the load balancer or the machine it’s running on fails in some way, the entire microservice becomes unavailable which may, in fact, render the entire system unusable if the service is essential to the operation of the system as a whole. This can be handled to some extent by using several central load balancers in concert with each other, but this adds additional complexity around coordination and doubles (or more) the total number of load balancers that must be maintained for the system.
Another issue with central load balancers is scalability. As mentioned above, an environment using microservices might have dozens of components working in concert, each with multiple instances. The number of network connections needed to manage the communication between components grows exponentially. While it is unlikely a central load balancer handling traffic for just one microservice will become a real bottleneck, it certainly could add churn or consume increasingly large amounts of resources as the number of available instances of its microservice increases.
Client-Side Load Balancing
Baker Street takes a different approach, and pushes load balancing to each client. In this approach, load balancing is fully distributed, with each client directly responsible for routing requests to an available microservice. Note that in this model each local load balancer handles all traffic for its local instance regardless of which or how many other microservices it communicates with; this inverts the central load balancer model where requests from everywhere go to the load balancer, which then sends them on to a single type of microservice:
Using a series of client-side load balancers working in concert with each other provides several benefits to a microservice environment: it simplifies service management, it automatically scales the system in accordance with the number of available instances, and it eliminates all single points of failure.
In particular, Baker Street creates a simpler management model: there is a 1:1 mapping between a microservice instance and local load balancer (no central load balancer required!), which means every microservice can be configured and set up in exactly the same way using a default configuration that works for most services. In addition, the distributed architecture exhibits linear scale: each new microservice instance adds new load balancing capacity. Thus, the system is self-provisioning and automatically provides the capacity needed to handle the available instances of a service. Finally, by storing availability information locally with each load balancer instance, Baker Street ensures that all active microservice instances can still route traffic, even if some instances of the microservice or instances of Baker Street components have gone down (see the architecture discussion below for more detail about this).
Baker Street Architecture
In order to function as a local load balancer, Baker Street needs to track the availability and location of all instances of the microservices within the larger system. This is done using three components:
- Sherlock: An HAProxy-based routing system running locally with each instance of your microservice to determine where connections from that instance should go.
- Watson: A health checker with local instances corresponding to each instance of your application.
- Datawire Directory: A lightweight, global service discovery mechanism that receives availability information from each Watson instance and pushes changes in availability to local Sherlock instances as needed.
After a new instance of a microservice is provisioned, its Watson service registers when it goes live and sends a message notifying the Datawire Directory. The Datawire Directory sends “it’s alive” messages to registered Sherlock instances, letting them know the new instance is available to process requests.
Similarly, when an instance goes down, Watson either tells the Datawire Directory that the instance is now unavailable or, if the whole container is down, the Datawire Directory notices the loss of its connection to the Watson instance and marks the relevant microservice instance as unavailable. Messages are then sent to the registered Sherlock instances to tell them to stop sending traffic to that instance of the microservice.
A microservice that wants to connect to another microservice first connects locally to Sherlock. Sherlock proxies the connection from the local microservice to an available microservice to handle the request. It knows where to send the requests because it keeps an up-to-date listing of available instances, managed via messages from the central Datawire Directory service which contains two things:
- A table of available routes mapping virtual to physical addresses.
- A list of client nodes interested in receiving updates.
Both lists are empty when Baker Street is first deployed. As noted above, each local Watson health check instance sends the Datawire Directory the virtual and physical addresses of each live service the first time it goes live and sends a notice to remove the information if a subsequent health check fails; these messages populate the directory listing and ensure it remains current over time. Each local Sherlock load balancer contains the address of the Datawire Directory and its own local list of available routes, which is empty when the instance first goes live. Each Sherlock registers with the Datawire Directory to receive updates and the Datawire Directory pushes changes to available routes to each registered Sherlock whenever such changes are received from a Watson monitor.
This publish/subscribe architecture minimizes messages related to health and routing, while ensuring rapid updates whenever there is a change in system health. The system is also self-managing: each Sherlock and Watson automatically register with the central directory service. Finally, additional load balancing capacity is added with every new microservice instance.
In addition, because each local Sherlock instance maintains its own list of available servers, the system remains operational even if the Datawire Directory component goes down; the only issue in this case is that updates about the health of instances are not received or acted upon. This means there is a slightly increased chance of sending traffic to a defunct instance that cannot handle it.
Baker Street eliminates all other single points of global failure as well. If a local microservice instance goes down but its container remains active, then its corresponding Watson instance will inform the central directory to remove it from rotation, and all of the load balancers will be notified to remove that instance from their list of available nodes. If an entire client environment goes down, the Watson discovery service may not have a chance to notify the Datawire Directory directly, but the messaging link between the two will also go down and the directory service will notify the interested Sherlock systems that the instance is down. Basically, no matter the circumstances, even though a few connections may be lost, the system as a whole should remain functional because the majority of instances still work and still have access to their local load balancer containing information that is still mostly up to date.
One possible argument against using client-side load balancing is the effectiveness of actually balancing usage of a service across the entire global network of instances. Each load balancer will distribute connections more-or-less evenly across the whole system and over time the global traffic should be evenly distributed, but in theory nothing prevents each local load balancer from randomly deciding to send its next connection to the same instance as all of the other load balancers in the system and overloading that instance. The odds of this happening are low, especially in large distributed systems, and in practice this does not seem to happen. If you are particularly concerned about this possibility, it is easy to combat by designing the service discovery used by the local load balancers so that the load balancers give a slight bias to nearby instances.
The approach used by Baker Street has been used in numerous cloud deployments, including Airbnb’s SmartStack and Yelp. Baker Street builds on these architectures, but with an emphasis on simplicity and usability by employing a streamlined installation and configuration process using sensible defaults. The result is an efficient load balancer that is reliable, automatically scales and is easy to manage.