Data / Open Source / Sponsored / Contributed

Why We’re Porting Our Database Drivers to Async Rust

4 Apr 2022 7:17am, by

Piotr Sarna
Piotr is a software engineer very keen on open source projects and C++. He previously developed an open source distributed file system (LizardFS) and had a brief adventure with the Linux kernel during an apprenticeship at Samsung Electronics. Piotr graduated from University of Warsaw with a master's degree in computer science.

The latest client-side driver for ScyllaDB, a fast and scalable NoSQL database, is written in pure Rust with a fully async API using Tokio.

Although optimized for ScyllaDB, the driver is also compatible with Apache Cassandra.

This Rust driver started as a humble hackathon project, but it has grown to become our fastest and safest Cassandra Query Language (CQL) driver.

In our benchmarks, we happily observed that the ScyllaDB Rust Driver beats even the reference C++ driver in terms of raw performance. That gave us an idea: Why not unify all our drivers to use Rust underneath?

Benefits of a Unified Core

Database drivers are cool and all, but there’s one fundamental problem with them: a huge amount of reinventing the wheel, implementing the same logic over and over for every possible language in which people write applications. Implementing the same thing in so many languages also increases the chances that some of the implementations have subtle bugs, performance issues, bit rot and so on. It sounds tempting to deduplicate as much as possible by reusing a common core. In our case, that would be the Rust driver.

Easier Maintenance

When most of the logic is implemented once, maintainers can focus on this central implementation with extensive tests and proper review. It takes much less manpower to keep a single project up to date than to try to manage one project for each language. Once a new feature is added to ScyllaDB, it’s possible that only the core should be updated, making all the derivative implementations automatically benefit from it.

Fewer Bugs

It goes without saying that deduplication helps reduce the number of bugs because there’s simply less code where a bug can occur. Additionally, backporting urgent fixes also gets substantially easier because the same fix wouldn’t have to be carefully rewritten in each supported language.

Even better, existing drivers already have their own test suites for unit tests, integration tests and so on. A single-core implementation would therefore be tested by many independent test harnesses and lots of cases. Sure, the majority of them will overlap. However, there’s no such thing as a perfect test suite. Using several reduces the probability of missing a bug and generally improves test coverage. And all the tests are already there, for free!

Performance

Some drivers are slow due to their outdated design. Some are faster because they’re implemented in a language that imposes less overhead. Some, like our Rust driver, are the fastest.

Similar to the way Python relies on modules compiled in C to make other modules less unbearably slow faster, our CQL drivers could benefit from a Rust core. A lightweight API layer would ensure that the drivers are still backward compatible with their previous versions, but the new ones will delegate as much work as possible straight to the Rust driver, trusting that it’s going to perform the job faster and safer.

Rust’s asynchronous model is a great fit for implementing high-performance, low-latency database drivers because it’s scalable and allows high concurrency in your applications. Contrary to what other languages implement, Rust abstracts away the layer responsible for running asynchronous tasks. This layer is called runtime.

Being able to select, or even implement, your own runtime is a powerful tool for developers. After careful research, we picked Tokio as our runtime due to its active open source community, focus on performance; rich feature set, including complete implementation for network streams, timers, etc., and lots of fantastic utilities like tokio-console.

Language Bindings

Writing code in one language in order to use it in another is common practice, and there are lots of tools available for the job. Rust is no exception. Its ecosystem is generally very developer-friendly, and there are many crates that make bindings with other languages effortless.

C/C++

Binding with C/C++ applications doesn’t actually require much effort anyway. Rust uses LLVM for code generation, and the output executables, libraries, and object files are more or less linkable with any C/C++ project. Still, there are a few good practices when using Rust and C/C++ in a single project.

First of all, make sure that name mangling won’t make it hard for the linker to find the functions you compiled in Rust. Anyone who ever wrote functions in C++ and used them from C is definitely familiar with the keyword extern “C”, and the same trick applies to Rust. Simply mark the functions that you mean to export with extern "C", and names will not be mangled in any way. Then, the linker will have an easier job matching the Rust parts with your C++ object files and executables.

For an even smoother developer experience, the cxx crate can be used to reduce the amount of boilerplate code and make the bindings more robust.

Python

The Python CQL driver is extremely popular among ScyllaDB and Cassandra users, but, well, Python is not exactly well known for its blazing speed or scalability for high concurrency applications.

Fortunately, due to its dynamic typing and the interpreter being lenient, it’s also really easy to provide bindings to a Python application. PyO3 crate sounds like it has great potential for simplifying the development of native Python modules.

Challenges

Even though there are lots of advantages for unifying the implementation of multiple drivers, there also are drawbacks. First of all, each tiny bug in the Rust core now has global scope; it would affect all the derivative drivers. Then, the glue code provided to bind our Rust driver with the target language is also a potential place where a bug can hide. And, relying on third-party libraries for bindings adds yet another dependency to each driver.

Driverless?

Lately, it’s become popular to embrace the “driverless” way and expose an interface implemented for a well-known protocol like gRPC or HTTP(S). It’s an interesting point, and certain applications and developers could definitely benefit from that approach. However, going through yet another layer of protocols creates overhead (multiple rounds of serialization/deserialization, parsing the protocol frames and so on). Users should be able to opt-in for better performance, which native CQL drivers provide.

What’s Already Done

Porting the CQL C++ driver was already mostly done during our internal hackathon. While it’s still a work in progress, it’s also very promising because the compatibility layer is quite thin for C++, partly because the application binary interfaces (ABIs) of both languages share many similarities.

Summary

Unifying the drivers is a rather large and complicated task. We’ve only just begun, but we have high hopes for the future performance and robustness of all our CQL drivers. I shared a bit more about our journey in a recent talk: “ScyllaDB Rust Driver: One Driver to Rule Them All.” Also, if you’ve read this far, maybe you’d like to become a contributor to the ScyllaDB native Rust driver? Join us at https://github.com/scylladb/scylla-rust-driver.

Feature image via Pixabay