How Ticketmaster Used Kubernetes Operators to Fill a DevOps Gap

One of the major new developments in the Kubernetes landscape is the rise of Kubernetes Operators.
Operators offer a way for administrators and operators to allow developers the freedom they desire, while still enforcing policies and best practices against their self-service systems deployments.
The advent of Kubernetes Operators can be traced back to how the cloud computing revolution has been about developers. Developers wanted dynamically provisioned systems, even if they were restricted in some ways. They drew businesses to the cloud by building new applications that met business goals in shorter windows of time, forcing IT to try and mirror that public cloud on-demand model behind the firewall. This is, perhaps, why Kubernetes was an innovation that came after the public cloud, not before.
Operators allow policies and rules to be laid out on a service-by-service basis within Kubernetes. This has risen out of the need to manage stateful applications in an otherwise stateless container-based ecosystem. Examples of current open source projects that now have Operators built for them include MongoDB, Couchbase, PostgreSQL (Crunchy Data), etcd, Prometheus, and Redis. While this initial set of Operators is very database-focused, there’s no reason the pattern cannot be adopted by other systems, such as it has for Prometheus monitoring.
As an example of a successful implementation of the best practice, Kubernetes Operators have enabled each individual project team at Ticketmaster, a leading live entertainment company, to run its own specific instance of Prometheus, Tim Nichols, vice president, technology platform, at Ticketmaster, said.
“We’re running full steam ahead with Kubernetes and Prometheus. Those are the biggest Cloud Native Computing Foundation (CNCF) projects that we’re adopting. As far as Operators go, we’re primarily using the Prometheus Operator today,” Nichols said, who is responsible for Ticketmaster’s developer tools and platform. “It’s done a lot for us to enable self-service consumption. We did it before [without the Prometheus Operator], but weren’t getting a very consistent service from Prometheus, causing unneeded support overhead that could have been avoided with better standards.”
When explaining why Ticketmaster switched from using individual versions of Prometheus to the Prometheus Kubernetes Operator, he said: “It really came down to that ability to define more clear patterns for self-consumption by development teams.
Ticketmaster previously relied on Helm Charts for deployments, “which left too much flexibility for teams to build somewhat snowflake-like models of Prometheus,” Nichols said.
“Using Operators pulled that back in, and gave us a lot more opinionated model for how to build Prometheus in the standard way for Ticketmaster across the board,” Nichols said. “That made things easier for the developers.”
The adoption of the Prometheus Operator at Ticketmaster was a governance and policy win for the IT department, Nichols said.
“The lack of structure we were getting from our old model gave teams too much flexibility, and that meant we had a lot more support to do for our cluster operations team to support those development teams running into problems and having challenges,” Nichols said. “By using Operators, we were able to put more standard tuning in place. For us, it was a governance issue: it gave us the ability to drive those standards.”
This helped increase the velocity for each development project. Instead of fussing with metrics gathering, each team could now gather their metrics from day one, without worrying about their specific needs as compared to the rest of the organization.
Gathering those metrics early ensures the developer feedback loop needed to increase velocity is in place at the beginning of the development process. That means development teams can receive the self-service type IT support they need to meet business goals in proper time frames.
“I’ve talked before about the DevOps journey for Ticketmaster. We went hard, several years ago, to enable and empower our development teams to do their own thing in a lot of spaces, and that’s true in the Kubernetes world, where we’re very flexible. What we’ve learned is that, a lot of times, the teams want more structure than what we’ve given them,” Nichols said. “They want us to be more opinionated about things and to focus on the business value, and not on fundamental things like monitoring. Not all of the development tools we’ve adopted in the past have been detailed in how we allow that, and Operators are a step forward for this.”
Ticketmaster powers unforgettable moments through its ticketing products happening every 20 minutes on average. The teams at Ticketmaster have a bit less of a regular schedule for high traffic conditions and high volumes of transactions. Ticketmaster’s big all-hands-on-deck moments take place during the minutes after a big show goes on sale. This is when the Ticketmaster Technology teams are especially concerned with watching their Prometheus results, keeping an eye on near real-time metrics to spot problems early.
Today, Nichols said that when you “land on the site to buy tickets, a significant portion of that is running Kubernetes and Prometheus.” These systems are also used to control, “portions of what enables the engines for entry at events,” he said.
That means even the legacy systems inside Ticketmaster are now tied into those cloud-based and Kubernetes-based services. “There is a portion of our ticketing systems that comes from that legacy model. We’re modernizing and moving some of that around,” Nichols said. “There is an effort to think about how that might look in a CNCF model today, inside the company. That’s ultimately a very small portion of the very back end of our ticketing system.”
The older systems had interfaces that allowed distributed systems to model and protect them. “That’s a core part of what our system is building out: that protection for other legacy systems that live in the company,” Nichols said.