Data on Kubernetes: How We Got Here, What’s Next
DETROIT — Though Kubernetes was not designed to handle stateful applications, more than three-quarters (76%) of the participants in a survey by the Data on Kubernetes (DoK) Community now use databases on Kubernetes.
That’s just one of the nuggets from the survey of 500 Kubernetes users that found not only a sizeable leap in use of K8s for data workloads in the past year, but also growth in use for analytics and machine learning, and in the belief that using Kubernetes for data has a transformative impact on their organizations.
At DoK Day, one of the co-located events at KubeCon+CloudNativeCon North America, a panel discussed how we got to this point, lessons learned and where to start in using Kubernetes on data projects. The DoK community, founded in June, has over 4,000 members and has held over 100 meetups worldwide.
The panel members were Xing Yang, tech lead at VMware; Gabriele Bartolini, vice president of cloud native at EDB; Patrick McFadin, vice president of developer relations at DataStax; Bhavin Shah, senior technical marketing manager at Pure Storage; and Ryan Wallner, lead developer advocate at Dell.
First off: Why use Kubernetes for data?
Basically, companies don’t want to run two different kinds of infrastructure, Yang pointed out.
“They want to be able to manage their data layer the same way that they manage their applications, using the same tools and also, with Kubernetes, you can easily deploy and scale your SQL workloads,” she said.
Bartolini said the reason might differ among different personas: “If you’re a developer, you might be interested in for example, the capabilities that databases like Postgres offer you, and you want to bring them in Kubernetes. If you are a Kubernetes administrator, you might be interested in declarative configuration Infrastructure is Code. So you’re looking for an operator that can help you manage databases in that way.”
But he said his favorite reason is DevOps as a culture that can break the barriers between developers and administrators and database administrators and put together databases with the applications inside Kubernetes.
McFadin maintained that if you’re running a database outside of Kubernetes, you cannot call yourself cloud native. You’re running your infrastructure in two different places.
“And until we have a better orchestration framework, Kubernetes is the winner. … This is the future that we are trying to build, which is highly automated infrastructure that just does what we need to do. But we’re getting to the point now where our infrastructure will conform to our application rather than our application conforming to our infrastructure.
“And how do we get there? We need to be able to do things declaratively, bespoke. And right now databases are not that. When you install a database, you create this edifice that everyone has to pray to, you make promises … you create these gods among men, which are called DBAs. That era is over. We need to stop doing this. All infrastructure is infrastructure.”
Shah pointed to benefits Kubernetes can bring to databases on a day-to-day basis like high availability, dealing with pod disruption and budgets and the ability to perform non-disruptive rolling upgrades.
Ryan, acting as emcee, then asked the group where they have come from with Kubernetes and what they see in the future.
McFadin talked about working with Cassandra and trying to automate as much as possible with bash. And then DevOps happened, and Kubernetes was an obvious next extension.
“But what always bothered me was that every time we do something in DevOps, the database was never included. I’ve always worked in data infrastructure, and that seems like a problem. Just working within the Cassandra community, it was clear that people were trying and failing, or trying and succeeding … [but] I’ve always wanted something like this. And here we are; we finally have a chance. We got this. We could actually pull this off this time, and not have to run an install script,” he said.
Bartolini said he began talking about DevOps with Postgres about 10 years ago.
“What I really liked is that Kubernetes is the single authority between databases and applications. And that’s why I say Postgres is a database in Kubernetes not on Kubernetes. … If they are together, the same authority can control applications, routing and databases. When I discovered Kubernetes, I saw that that was possible. And in my opinion, so far, that’s the best experience you can get of a Postgres database,” he said.
Referring to CSI, she said, “it has basic functions like create the volume, attach the hash mount and unmount. It allows the different storage systems to link storage to containers through common APIs.
“One feature on measure is CSI topology that allows Kubernetes to do intelligent scheduling. So that dynamically provisions the persistent volumes at the best place where you can run your pod and allow you to deploy and scale your applications across different domains and different failure domains.”
Bartolini said that the addition of local persistent volumes was a game-changer. As was having the ability to fail over easily, Wallner added.
“We heard earlier from some of the talks that we ran databases, had opportunities, and it wasn’t the best experience. So … the evolution is communities getting better,” he said.
Added Shah: “If you do a quick Google search for, like, why stateful apps or why use a database and Kubernetes, you won’t see the debate that was happening earlier in the day, why should you even run stateful apps on databases on Kubernetes. [These sessions] have shown how things have improved.”
So the next question was about lessons learned.
“I think the best advice I can give is to start from Day Zero,” Bartolini said. “Start from Day Zero and plan for the database or work with the infrastructure. That’s one of the advantages of Kubernetes is that through configuration, you can actually choose the topology. … So I think the next challenge is to have a better way to manage your resources across Kubernetes clusters. But otherwise, you can share, for example, a Postgres cluster with other workloads or dedicated nodes, even bare metal machines that run a single Postgres instance. And that’s all done declaratively. I found that amazing. There’s a lot of flexibility.”
Yang urged people to use common APIs when possible, and to learn about Kubernetes’ self-healing properties.
McFadin pointed to the three basics: people, process and technology.
“With people, the most important thing is you are not going to get there with what you had yesterday. That doesn’t mean that you have to fire people. You just need to rethink the way you do infrastructure. And that is a big sea change that will stop you from being what you want to be in a cloud native world. … Process. We’ve already talked about technology. The thing that I think that most people need to understand is that the technology may look the same. It may quack like a duck. But it’s definitely not a duck.
“I’ll give you a couple of examples. Cloud native technology now is converging towards object storage. Block storage is dying; it’s going to die, get over it. Flink, Pulsar, Spark, the Cassandra project — we’re working on this now — is going to use object storage. Capacity storage is not going to be block storage. Block storage is hard. It’s multifaceted. This is a way that we all have to go. We’re going to the future. Hang on.”
The participants all pointed to a wealth of resources such as books, online sites like YouTube and the DoK community podcasts to help people new to Kubernetes learn how to get started on K8s for their data projects.