HAProxy Is Still An Arrow in the Quiver for Those Scaling Apps

HAProxy is not exactly sexy, but it is powerful, an example of how Internet scale is doable for even the smallest of developer teams. It’s one of those technologies that has become essential with new offerings over the years. The latest is Docker, an example of a technology that makes for faster app development but requires the load balancing that HAProxy provides when services scale.
HAProxy runs on Linux, Solaris, and FreeBSD. It is designed to improve the performance and reliability of a server environment by distributing workloads such as a database across multiple servers. The creator of HAProxy, Willy Tarreau, describes the load balancer as follows:
HAProxy is a very fast and reliable load balancer and reverse proxy for HTTP and TCP-based protocols, which is particularly suited to build highly available architectures. It can check servers state, report its own state to an upper-level load-balancer, share the load among several servers, ensure session persistence through the use of HTTP cookies, protect the server from surges by limiting the number of simultaneous requests, add/modify/delete incoming and outgoing HTTP headers, block or tarpit requests matching pre-defined criteria and protect against some forms of DDoS.. Its simple event-based architecture provides very high performance and make its code auditable. I’ve had report of several moderate traffic sites (10 to 100 Mbps) using it with success at constant loads of up to several thousands hits per second. No break-in has ever been reported (yet). It’s known to work at least on FreeBSD, Linux, OpenBSD and Solaris. I too use it to protect my web server and to provide IPv6 connectivity.
But who uses HAProxy and why? Digital Ocean has adopted it and offers extensive documentation about it. The company has a detailed tutorial about implementing SSL Termination with HAProxy on Ubuntu 14.04. It provides a granular, how-to capability for setting up the open-source software TCP/HTTP load balancer and proxying solution.
In its tutorial, Digital Ocean reviews how to use HAProxy for SSL termination, for traffic encryption, and for load-balancing Web servers. It also shows how to use HAProxy to redirect HTTP traffic to HTTPS. Native SSL support was implemented in HAProxy 1.5.x, which was released as a stable version in June 2014.
Customers often start to use HAProxy when they adopt more sophisticated infrastructures, according to Moisey Uretsky, Digital Ocean’s Chief Product Officer and the company’s co-founder:
Usually, the process for customers is that they start off with a large monolithic app and a db of their choice on a single server. They split off the db first, and as they begin to work toward a more distributed architecture, the next logical step forward is to split up the Web service across several nodes. Without a load-balancer service, this work must all be done by the customer. With HAProxy, nginx, and other open-source projects, today’s load balancers are very powerful, able to serve a tremendous amount of traffic with very little resource usage. For us, it was the logical next step to provide this functionality for customers so they wouldn’t need to fumble with the setup, configuration, and running of the service. It allows them to get back to shipping code sooner and having a more reliable infrastructure to serve their requests from.
The documentation for how to use HAProxy could make for a long blog post. For our purposes, we thought it would be helpful to offer some examples of companies that are using HAProxy in their operations as well as examples of companies that are applying it for newer deployments, such as on Docker.
GitHub
GitHub has been running HAProxy for several years. Earlier this year, GitHub posted about improving its SSL setup “by deploying forward secrecy and improving the list of supported ciphers.” Forward secrecy comes with a number of considerations, including the need to eliminate session tickets, which ran in-memory. That meant that an attacker could decrypt traffic from prior sessions whose ticket was encrypted using the session ticket key. To remedy this, GitHub kept Session IDs, which had only a five-minute window on the GitHub platform instead of the entire lifetime of the process. The move helped keep forward secrecy and continued offering customers a high level of performance.
Airbnb
Airbnb uses SmartStack to manage its considerable workloads. The company adopted the SmartStack approach as it grew from a monolithic application to one with a significant code base and a growing number of engineers touching different parts of the code. SmartStack has two components: Nerve and Synapse. Nerve does the service registration, and Synapse provides the service discovery. Airbnb finds HAProxy more effective than other services due to such features as advanced load-balancing algorithms, queueing controls, retries, and timeouts. In a blog post, they also cited built-in health checking, which is used to guard against network partitions.
Using HAProxy for Docker Service Discovery
Jason Wilder describes a way to solve the “backend service problem” using service discovery for Docker containers that run on distributed infrastructures. In a previous post, he described how to create an automated nginx reverse proxy for Docker containers running on the same host.
The architecture is modeled after Airbnb’s SmartStack, which reflects how Docker is influencing a new workflow different from that of Airbnb that uses “etcd instead of Zookeeper and two Docker containers running docker-gen and HAproxy instead of Nerve and Synapse.”
Wilder described how discovering services is handled by the jwilder/docker-discover container. Docker-discover polls etcd periodically and generates an HAproxy config with listeners for each type of registered service. HAProxy is further used to do health checks in case any of the backend services go down.
If any of the backend services goes down, HAproxy health checks remove it from the pool and will retry the request on a healthy host. This ensures that backend services can be started and stopped as needed as well as handling inconsistencies in the registration information while ensuring minimal client impact.
Again, HAProxy is used for an important function in a relatively new type of workflow, showing its longevity and usefulness.
Stack Exchange
The Stack Exchange service has made its fame with Stack Overflow. It has use HAProxy since day one of the company. It always uses the dev version to get the most out of it:
One of the reasons that HAProxy is so damn good at what it does is that it is single-minded, as well as (mostly) single-threaded. This has led it to scale very well for us. One of the nice things about the software being single-threaded is that we can buy a decent sized multi-core server, and as things need more resources, we just split them out to their own tier, which is another HAProxy instance using a different core.
Things get a bit more interesting with SSL, as there is a multi-threaded bit to handle the transaction overhead there. Going deeper into the how of the threading of HAProxy is outside the scope of this post, though, so I’ll just leave it at that.
In its first day, Instagram had 25,000 users sign up. We know the rest of the story. Here’s how they did it with a team of two people.
And again, HAProxy is a stand-by. In the deck, Instagram Co-Founder Mike Krieger explained that elegance and simplicity are what achieve scaling. That means not re-inventing the wheel: If the app server is having a kernel panic from a heavy load, does that mean you write a monitoring daemon?
No, you don’t.
That’s what HAProxy is for.
“When people started to develop Web-scale apps, they needed a battle-tested load balancer,” said CoreOS Co-Founder Alex Polvi. “It is not super interesting; it just works very well. It’s an arrow in the quiver.”