Apache Cassandra 4.1: Building the Database Your Kids Will Use
Feature image via Pixabay
Cassandra 4.1 will be released sometime in July, so now’s a good time to look back over the last year and reflect on where we are as a project. We committed to a yearly cadence and guess what? We kept the promise!
If you haven’t been watching the Cassandra community for a while, it’s been a busy place. I know many of you have Cassandra clusters you installed years ago and completely forgot about. By nature, a good database should be boring and forgettable. Something that’s always there and ready. We invite you to come back and see what’s new. Tell us about your absolutely boring database. Maybe an upgrade is in your future? Let me tell you what’s coming.
Stability Is a Feature
Cassandra 4.0 marked an important milestone for a 10-year-old database. The project put an enormous amount of resources into building tools that define and validate a stable database. Those tools were all integrated into the build and release pipeline to ensure that any future changes were thoroughly tested and never introduced a regression.
The result was one of the most stable databases ever shipped. Teams have been deploying 4.0 clusters over the past year at a furious pace, and the reports so far are overwhelmingly positive. Stability is a killer feature for a database, and if the world depends on Cassandra for storing critical data, we’re extremely happy that it’s living up to this promise.
Do you know what’s another killer feature of a database? Shipping new features. There was a regrettable length of time between 3.11 and 4.0, but with a solid foundation built, we’re now moving at a regular cadence. Cassandra 4.1 is currently in pre-release, on track with our goal for yearly major releases. All the validation work done in 4.0 is paying off with a build pipeline that gives contributors confidence when building new functionality in Cassandra.
4.1’s Shiny New Thing: Pluggability
So what do we get out of this database and building on a stable core? The theme for Cassandra 4.1 is enabling feature plug-ins. Why are plug-ins a useful new thing, you ask? It’s the structured way to add features to an existing product without changing the core code. With respect to Cassandra, that means adding important new features without actually changing Cassandra. This will drive different kinds of innovation between major releases of the database.
One of the early drivers of this idea was Instagram. They had built a version of Cassandra that used RocksDB as the underlying storage engine, called Rocksandra. It radically changed the way we thought of storage with Cassandra without changing the networking and node coordination. It also surfaced two distinct problems that needed to be addressed. First is the need for a clear interface to Cassandra internals so when using outside code, there’s an understood contract between them.
The Rocksandra team had to rely on deep knowledge of Cassandra internals to make it work. The second was having a reliable testing framework. Instagram understood its use case and had its own acceptance testing. For a general-purpose database, however, there needs to be a much wider scope of functional testing. Based in part on those lessons, both of those problems have been tackled in the project.
Some of the new plug-in features that are available in 4.1 include:
- Storage — The feature talks about memtables, but those translate into the underlying storage because they are mapped to be written and read by Cassandra. Expect to see some interesting implementations focusing on specific use cases, including fast memory storage and columnar formats.
- Network Encryption — Previous to this change, any SSL certificates had to be present in the local file system. This change allows for external key providers such as HashiCorp Vault to make key management easier in large deployments.
- Authentication — External and centralized authentication is a desirable feature for most organizations that manage a lot of infrastructure. This change allows for the Cassandra command-line tool, CQLSH, to use LDAP, Kerberos and others.
- Schema — Cluster schema has been stored in system tables as an only option. For global coordination, especially in Kubernetes, external schema storage such as etcd is now an option.
- Guardrails — Operators around the world rejoice! You now have the ability to restrict anti-patterns in your production environment. An example would be limiting the number of indexes that can be added to a table. This is already in use with DataStax Astra, our managed Cassandra service, which was donated to the project.
And Now, the Future! ACID Transactions and More
The future of Cassandra after 4.1 will be 5.0, and those discussions are already in full swing. This is the database designed for growth and the cloud native applications being built for the years to come. The next 10 years of Apache Cassandra is about rising to that challenge by building on the solid foundation of the previous 10 years.
The most transformative new 5.0 feature may shock you, but here we go: Cassandra will be adding full ACID transactions.
If you are in the camp of people saying you could never use Cassandra because it “doesn’t support transactions,” prepare to make room for more Cassandra in your life. Cassandra 2.0 added what we called “lightweight transactions” based on Paxos. To keep the guarantees that Cassandra delivers while not destroying performance, Paxos was the right choice at the time. Since 2013, new consensus protocols have emerged, such as Spanner and Raft, both of which are very popular in the database world — but they require tradeoffs that aren’t aligned with the linear scale and uptime guarantees that Cassandra users expect.
What’s needed is the next generation of distributed consensus protocol. This has been described in a protocol known as Accord, which does consensus in one round trip and doesn’t require complicated leader failover mechanisms. In short, it enables Cassandra to be Cassandra while delivering full ACID transactions. We are building it now, so if you want to join in the conversation, we’d appreciate your input.
A more nonspecific but important direction for Cassandra is the move to being more cloud native. The ability to add plugins and shape the deployment of your Cassandra cluster is a step in the right direction. Serverless databases in Kubernetes are the future, and Cassandra is well-positioned to be that database for years to come. DataStax released a white paper on a cloud native database based on Cassandra that could be a blueprint for the future.
For end users, it means scaling without manual effort and true multitenancy where multiple applications share the same infrastructure. For operators, especially site reliability engineers, there’s the native Kubernetes deployments that separate compute from storage to scale independently. Most important: You only provision what you need asynchronously to save on those ever-growing infrastructure bills.
Time to Celebrate
Last year, we had a party for the release of 4.0, and this year will be no different! The Cassandra World Party is a one-hour, online-only event that we’ll hold at three different times on July 20 to cover as many time zones as possible. The highlight will be the five-minute lightning talk format where the user community can get on and tell their story … quickly!
We hope you’ll join us and celebrate with the rest of the community. If you’re feeling up to the challenge, we would love to hear your story. Just tell us your topic and which time zone works for you. Hope to see you there!