Ancestry.com’s Docker Story and How It Eventually Led to Kubernetes
Paul MacKay, software engineer architect at Ancestry.com spoke recently at the Microservices Virtual Summit and laid out the issues with adopting containers and the sea change it created in how they ran their technology stacks.
Ancestry is a site that has 20 billion historical records, 90 million separate family trees with over 10 billion profiles. There are 175 million shareable photos, documents, written stories, and collections for its four million members. This adds up to over 9 petabytes of data.
They currently have nine clusters both on-premise and on AWS. In the production clusters, there are hundreds of nodes and hundreds and hundreds of individual services, then thousand of pods running through Docker and Kubernetes.
They went online in 1996 with a team that has remained steady at three to four people. Starting out with Microsoft Windows with C# and .NET frameworks to provide services, they expanded over time, to include .NET, SQL Server, and IIS then moved to open source technologies including Java, Node.js, and Python running on Linux.
But after seeing Docker demonstrated at a conference three years ago, they added containers to the mix, which created the need for a change in the way they thought about running their stack.
“We were able to demonstrate to management that it was easy to deploy and to scale these services and that deployment times would drop,” he said. Moving to Docker also enabled them to utilize our computing resources more effectively.
Expect a Bumpy Road
First of all, he said, adopting new technologies is hard. Developers at Ancestry are not there to create infrastructure, he said, but to develop new features for customers. “You have to recognize right up front that adoption of new technologies takes away from developing features you’re providing to your customers,” he said.
Because new technologies can be very disruptive, upper management has to be convinced as to why new technologies would be worth the diversion of attentions, resources, and efforts away from new features, he explained.
It’s critical to have a patron, he said, who can give you not just the time and resources you’ll need, but also the real ability to experiment, to do new things. Someone who understands that there may be failure along the way and allows for that.
At the same time, you need a buy-in from the people doing the work, McKay explained. At Ancestry, they created pilot teams for rolling out the new microservices technology. These teams don’t just create POCs (proof of concepts), they have a diversity of real problems to solve.
This Is What Support Looks Like
“If [the teams] were not successful, were not successful,” he said. “We were part of this change of process.” They gave these teams full support, starting with a lot of training.
When they started the path towards microservices, McKay said, they had a lot of Windows developers who had no experience in Linux, containers, or orchestration, and needed to be trained in how to make them work. Not just the tools, he said, but also the concepts and the paradigms of how to decompose services into smaller chunks. The goal was to make them feel comfortable and empowered to be able to adopt the new technologies.
Then they provided tools to help quickly deploy any size of service, said McKay. These tools work across all clusters and provide conventions and best practices for both new and experienced developers.
When you adopt new technologies, he said, you’re really learning on the fly. And that creates the need to be very agile. There will be mistakes made and that’s okay. They made sure the teams knew they needed to work together and that they’re all going to make each other successful.
Back to Technology
They began the process of deciding which services to break up. McKay explained there are many aspects to consider such as network latency, monitoring, and coordinating the deployments of all these services. “Things are not free,” he pointed out. “There is a cost in managing services.”
And you need to understand how to scale, he said. “Do you scale a portion of these microservices or a subset? Do you scale all of it at once? Does it really make sense that these services will exist by themselves? Is it something useful for the ecosystem?” he asked.
They let developers decide how big a service will be. “We truly feel that with containerization the easiest path is for developers to make a decision as to the size of services and how to decouple things,” he explained.
So they launched Docker, building their own Linux distros, and CoreOS. And when they saw the demo for Kubernetes in beta, McKay by-passed other orchestration tools in favor of the new technology.
He created a small Kubernetes sandbox for their committed pilot teams who were committed to the effort of breaking their services apart and trying out Kubernetes for the container deployment.
They instituted daily standups, to ensure problems were addressed quickly. “There were some hard problems to be solved,” he said. “Solvable problems … but hard and unique and different problems.”
In addition to training, they created best practices and built templates and scripts to help the developers jumpstart their initiation into how to break up services and then deploying them using containers and orchestrating them using Kubernetes, McKay explained.
Moving from REPL to CDEL
Programmers are familiar with the REPL environment (read, evaluate, print and loop), said McKay. With Kubernetes this changes to compile, deploy, execute and then loop (CDEL). “This means,” he said, “that there’s no longer this barrier of coordination of deployment, of experimenting with the various size and decoupling. That now you can truly compile, you can get it deployed, you can figure out what it looks like in the environment, and then you can reiterate and figure out what’s appropriate or not.”
Putting Standards in Place
In the year-and-a-half since launching microservices in production, they developed some deployment standards.
- Following the Kubernetes convention, they create a namespace for each service regardless of size, with a naming convention (functionalgroup-servicename)
- They limit one container per pod.
- In their production environment, each service has its own repository, regardless of size.
- They use Prometheus for monitoring services ensuring conformance to SOAs.
- Developers are allowed to deploy all the way to production. They start with very wide privileges narrow them as needed.
- They create separate clusters for each of the development, staging, and production environments.
- They run their own custom cluster-wide logger and created a namespace portal.
- Kubernetes uses an intra-cluster DNS server in the cube DNS and there’s a service discovery when deploying microservices which greatly reduces network latency.
- A CPU and memory quota is required for each service, regardless of the size. A developer can make a case for more resources, and they get what they need.
For more information and a deeper dive, check out his talk here.
CoreOS is a sponsor of The New Stack.
Feature photo and inset photos via Ancestry.com.