DataStax, a keeper of the open source Apache Cassandra NoSQL database, has open sourced a Cassandra Kubernetes operator in an attempt to make the database a premier choice for cloud native development on Kubernetes. A Kubernetes operator is a method of packaging, deploying and managing a Kubernetes application, and while this isn’t the first operator for Cassandra, Sam Ramji, chief strategy officer at DataStax, says that the company hopes to consolidate efforts with this addition.
“We’ve been pretty stoked to see how the community is working on different ways to take Cassandra and make it cloud native. There are a half a dozen Kubernetes operators, so we’re contributing ours, open sourcing it and then working to bring everybody to the same project and find a way to have one great Kubernetes operator that anybody using open source Cassandra can deploy,” said Ramji in an interview with The New Stack.
Patrick McFadin, vice president of developer relations at DataStax, echoed the sentiment, remarking that the community really needs to come together to make Cassandra the default for Kubernetes.
“We’re not proffering that this is the one true way. Everyone’s having the same problem, where running Cassandra can be difficult. It takes a certain amount of expertise to run a large cluster. We’re trying to solve that with Kubernetes,” said McFadin. “There’s a lot of projects out there using Kubernetes, and we’ve been talking to all of them. People running Cassandra are super smart and if we work together, we can solve this really quickly, but we have to work together.”
While the Kubernetes operator can help users more easily bring Cassandra to their Kubernetes environment, McFadin also emphasized the management API that runs in a sidecar with this deployment, which he says truly enables Cassandra to be cloud native.
“The real magic is in that sidecar in the management API that does the heavy lifting. When you’re operating a cluster of Cassandra, each node operates independently, but each node is super important. A lot of that stuff that happens there, that sidecar is managing it, bringing it up, bringing it down, adding new nodes to the cluster, shrinking the cluster. All these things that are typically very hard operations,” said McFadin. “The sidecar would eliminate the need for a human to type things on the command line. Unfortunately, that’s how you have to manage a Cassandra cluster right now. It’s very operator intensive. With cloud native, everything has an API. This adds a management API to Cassandra, which brings it to cloud native.”
Part of the problem before now, said Ramji, was that users would need to provision their data store to peak performance. In other words, if you have a 100 cluster environment, you provision a data store that can handle that level of traffic, but then if your Kubernetes scales down, the data doesn’t scale with it. “That’s provisioning into the Kubernetes environment and my sense is that that’s where a lot of technologies are playing today. The challenge is, what is the data environment that scales out with Kubernetes, scales in with Kubernetes, rides along with Kubernetes? That hasn’t really been done well. That’s because it’s an incredibly difficult technical problem to solve,” said Ramji.
Ultimately, Ramji says that Cassandra is a much better database option for Kubernetes than others, partly because it is a multi-master replicated database that doesn’t need to be sharded every time you add an order of magnitude of data.
“It will take care of all of that for you. So you just talk to it and ask it the same questions and write the same data that you always do. That means that you can just scale exactly as Kubernetes scales,” said Ramji. “That’s the big difference. You don’t have to rewrite your app logic around sharding because Cassandra does not have to be sharded.”
DataStax is a sponsor of The New Stack.
Feature image via Pixabay.