TNS
VOXPOP
Where are you using WebAssembly?
Wasm promises to let developers build once and run anywhere. Are you using it yet?
At work, for production apps
0%
At work, but not for production apps
0%
I don’t use WebAssembly but expect to when the technology matures
0%
I have no plans to use WebAssembly
0%
No plans and I get mad whenever I see the buzzword
0%
Data / Serverless / Storage

ScyllaDB’s Incremental Changes: Just the Tip of the Iceberg

The ScyllaDB team works with their garage doors up. A look at what kept the team busy in 2022, plus trends and tradeoffs in high-performance compute and storage.
Mar 20th, 2023 9:29am by
Featued image for: ScyllaDB’s Incremental Changes: Just the Tip of the Iceberg

Is incremental change a bad thing? The answer, as with most things in life, is “it depends.” In the world of technology specifically, the balance between innovation and tried-and-true concepts and solutions seems to have tipped in favor of the former. Or at least, that’s the impression the headlines give. Good thing there’s more to life than headlines.

Innovation does not happen overnight, and is not applied overnight either. In most creative endeavors, teams work relentlessly for long periods until they are ready to share their achievements with the world. Then they go back to their garage and keep working until the next milestone is achieved. If we were to peek in the garage intermittently, we’d probably call what we’d see most of the time “incremental change.”

The ScyllaDB team works with their garage doors up and are not necessarily after making headlines. They believe that incremental change is nothing to shun if it leads to steady progress. Compared to the release of ScyllaDB 5.0 at ScyllaDB Summit 2022, “incremental change” could be the theme of ScyllaDB Summit 2023 in February. But this is just the tip of the iceberg, as there’s more than meets the eye here.

I caught up with ScyllaDB CEO and co-founder Dor Laor to discuss this.

The ScyllaDB team works with their garage doors up. A look at what kept the team busy in 2022, plus trends and tradeoffs in high-performance compute and storage.

Note: In addition to reading the article, you can hear the complete conversation in this podcast:

Mini player:

Data Is Going to the Cloud in Real Time, and so Is ScyllaDB

The ScyllaDB team has their ears tuned to what their clients are doing with their data. What I noted in 2022 was that data is going to the cloud in real time, and so is ScyllaDB 5.0. Following up, I wondered whether those trends have kept pace with the way they manifested previously.

The answer, Laor confirmed, is simple: Absolutely yes. ScyllaDB Cloud, the company’s database-as-a-service, has been growing over 100% year over year in 2022. In just three years since its introduction in 2019, ScyllaDB Cloud is now the major source of revenue for ScyllaDB, exceeding 50%.

“Everybody needs to have the core database, but the service is the easiest and safest way to consume it. This theme is very strong not just with ScyllaDB, but also across the board with other databases and sometimes beyond databases with other types of infrastructure. It makes lots of sense,” Laor noted.

Similarly, ScyllaDB’s support for real-time updates via its change data capture (CDC) feature is seeing lots of adoption. All CDC events go to a table that can be read like a regular table. Laor noted that this makes CDC easy to use, also in conjunction with the Kafka connector. Furthermore, CDC opens the door to another possibility: Using ScyllaDB not just as a database, but also as an alternative to Kafka.

“It’s not that ScyllaDB is a replacement for Kafka. But if you have a database plus Kafka stack, there are cases that instead of queuing stuff in Kafka and pushing them to the database, you can just also do the queuing within the database itself” Laor said.

This is not because Kafka is bad per se. The motivation here is to reduce the number of moving parts in the infrastructure. Palo Alto Networks did this and others are following suit too. Numberly is another example in which ScyllaDB was used to replace both Kafka and Redis.

High Performance and Seamless Migration

Numberly was one of the many use cases presented in ScyllaDB Summit 2023. Others included the likes of Discord, Epic Games, Optimizely, ShareChat and Strava. Browsing through those, two key themes emerge: High performance and migration.

Migration is a typical adoption path for ScyllaDB as Laor shared. Many users come to ScyllaDB from other databases in search of scalability. As ScyllaDB sees a lot of migrations, it offers support for two compatible APIs, one for Cassandra and one for DynamoDB. There are also several migration tools, such as Spark Migrator, scanning the source database and writing to the target database. CDC may also help there.

While each migration has its own intricacies, when organizations like Discord or ShareChat migrate to ScyllaDB, it’s all about scale. Discord migrated trillions of messages. ShareChat migrated dozens of services. Things can get complicated and users will make their own choices. Some users rewrite their stack without keeping API compatibility, or even rewrite parts of their codebase in another programming language like Go or Rust.

Epic Games, Optimizely and Strava all presented high-performance use cases. Epic Games are the creators of the Unreal game engine. Its evolution reflects the way that modern software has evolved. Back in the day, using the Unreal game engine was as simple as downloading a single binary file. Nowadays, everything is distributed. Epic Games works with game makers by providing a reference architecture and a recommended stack, and makers choose how to consume it. ScyllaDB is used as a distributed cache in this stack, providing fast access over objects stored in AWS S3.

A Sea of Storage, Raft and Serverless

ScyllaDB increasingly is being used as a cache. For Numberly, it happened because ScyllaDB does not need a cache, which made Redis obsolete. For Epic Games, the need was to add a fast-serving layer on top of S3.

S3 works great and is elastic and economic, but if your application has stringent latency requirements, then you need a cache, Laor pointed out. This is something a lot of people in the industry are aware of, including ScyllaDB engineering. As Laor shared, there is an ongoing R&D effort in ScyllaDB to use S3 for storage too. As he put it:

“S3 is a sea of extremely cheap storage, but it’s also slow. If you can marry the two, S3 and fast storage, then you manage to break the relationship between compute and storage. That gives you lots of benefits, from extreme flexibility to lower TCO.”

This is a key project that ScyllaDB’s R&D is working on these days, but not the only one.

In the upcoming open source version 5.2, ScyllaDB will have a consistent transactional schema operation based on Raft. In the next release, 5.3, transactional topology changes will also be supported. Metadata strong consistency is essential for sophisticated users who programmatically scale the cluster and Data Definition Language.

This paves the way toward changes in the way ScyllaDB shards data, making it more dynamic and leading to better load balancing.

Many of these efforts come together in the push toward serverless. A free trial based on serverless was made available at the ScyllaDB summit and will become generally available later this year.

P99, Rust and Beyond

Laor does not live in a ScyllaDB-only world. Being someone with a hypervisor and Linux Red Hat background, he appreciates the nuances of P99. P99 latency is the 99th latency percentile. This means 99% of requests will be faster than the given latency number, or that only 1% of the requests will be slower than your P99 latency.

P99 CONF is also the name of an event on all things performance, organized by the ScyllaDB team but certainly not limited to them. P99 CONF is for developers who obsess over P99 percentiles and high-performance, low-latency applications. It’s where developers can get down in the weeds, share their knowledge and focus on performance in real time.

Laor said that participants are encouraged to present databases, including competitors, as well as talk about operating systems development, programming languages, special libraries and more.

ScyllaDB’s original premise was to be a faster implementation of the Cassandra API, written in C++ as opposed to Java.

Although ScyllaDB has invested heavily in its C++ codebase and perfected it over time, Laor also gives Rust a lot of credit. So much, in fact, that he said it’s quite likely that if they were to start the implementation effort today, they would have done it using Rust. Not so much for performance reasons, but more for the ease of use. In addition, many ScyllaDB users like Discord and Numberly are moving to Rust.

Even though a codebase that has stood the test of time is not something any wise developer would want to get rid of, ScyllaDB is embracing Rust too. Rust is the language of choice for the new generation of ScyllaDB’s drivers. As Laor explained, going forward the core of ScyllaDB’s drivers will be written in Rust. From that, other language-specific versions will be derived. ScyllaDB is also embracing Go and Wasm for specific parts of its codebase.

To come full circle, there’s a lot of incremental change going on. Perhaps if the garage door wasn’t up, and we only got to look at the finished product in carefully orchestrated demos, those changes would stack up more impressions. Apparently, that’s not what matters more for Laor and ScyllaDB.

Group Created with Sketch.
TNS owner Insight Partners is an investor in: Pragma, Optimizely, Real.
THE NEW STACK UPDATE A newsletter digest of the week’s most important stories & analyses.