Modal Title
Data / Open Source / Storage

ScyllaDB’s Take on WebAssembly for User-Defined Functions

We're adding helper libraries for Rust and C++, which will make writing a user-defined function no harder than writing a regular native function in any language.
Dec 8th, 2022 9:34am by
Featued image for: ScyllaDB’s Take on WebAssembly for User-Defined Functions
Image via Pixabay.

WebAssembly, also known as Wasm, is a binary format for representing executable code, designed to be easily embedded into other projects. It turns out that Wasm is also a perfect candidate for user-defined functions (UDFs) on the backend, thanks to its ease of integration, performance and popularity.

ScyllaDB, the database for data-intensive applications that require high throughput and low latency, supports user-defined functions expressed in WebAssembly, based on an open source runtime written natively in Rust called Wasmtime. In fact, we recently added Rust support to our build system to make future integrations even smoother.

This article provides a look inside how and why we integrated with WebAssembly.

Choosing the Right Engine

WebAssembly is a format for executable code designed first and foremost to be portable and embeddable. As its name suggests, it’s a good fit for web applications. Since it’s quite fast, it’s also generally a good choice for an embedded language.

One of WebAssembly’s core features is isolation. Each module is executed in a sandboxed environment separate from the host application. Such a limited trust environment is really desired for an embedded language because it vastly reduces the risk of somebody running malicious code from within your project.

Wasm is a binary format, but it also specifies a human-readable text format called WebAssembly Text format or WAT.

To integrate WebAssembly into a project, you need to pick an engine. The most popular engine is Google’s v8, which is implemented in C++ with support for JavaScript and a very rich feature set. Unfortunately, it’s also quite heavy and not very easy to integrate with asynchronous frameworks like Seastar, an open source C++ framework for high-performance server applications on modern hardware, which is a building block of ScyllaDB.

Fortunately, there’s also Wasmtime, a smaller (but not small!) project implemented in Rust. It supports WebAssembly but not JavaScript, which makes it more lightweight. It also has good support for asynchronous environments and offers C++ bindings, making it a good fit for injecting into ScyllaDB for a proof-of-concept implementation.

ScyllaDB selected Wasmtime because it’s lighter than v8 and has the potential to be async-friendly. Though we currently use the existing C++ bindings provided by Wasmtime, we plan to implement this whole integration layer in Rust and then compile it directly into ScyllaDB.

Coding in WebAssembly

So how would you create a WebAssembly program?

WebAssembly Text Format

First, modules can be coded directly in WebAssembly text format. It’s not the most convenient way, at least for me, due to Wasm’s limited type system and specific syntax with lots of parentheses. But it’s possible, of course. All you need in this case is a text editor. Being in love with Lisp wouldn’t hurt either.

C++

C and C++ enthusiasts can compile their language of choice to Wasm with the clang compiler.


The binary interface is well defined, and the resulting binaries are also well optimized underneath. The code is compiled to WebAssembly with the use of LLVM representation, which makes many optimizations possible.

Rust

Rust also has the ability to reduce Wasm output in its ecosystem, and a wasm32 target is already supported in Cargo, the official Rust build toolchain.

AssemblyScript

There’s also AssemblyScript, a TypeScript-like language that compiles directly to WebAssembly. AssemblyScript is especially nice for quick experiments because it’s a scripting language. It’s also the only language that was actually invented and designed with WebAssembly as a compilation target in mind.

User-Defined Functions

Why does ScyllaDB need WebAssembly? Our first use case involves user-defined functions (UDFs). UDF is a Cassandra Query Language (CQL) feature that allows you to define a function in a given language, then call that function when querying the database. The function will be applied on the arguments by the database itself, and only then will it be returned to the client. UDF also makes it possible to express nested calls and other more complex operations.

Here’s how you can use a user-defined function in CQL:


UDFs are cool enough by themselves, but a more important purpose is enabling user-defined aggregates (UDAs). UDAs are custom accumulators that combine data from multiple database rows into potentially complex outputs. UDAs consist of two functions: one for accumulating the result for each argument, and another for finalizing and transforming the result into the output type.

The code example below shows an aggregate that computes the average length of all requested strings. The functions below are coded in Lua, which is yet another language that ScyllaDB supports.

First, let’s create all the building blocks: functions for accumulating partial results and transforming the final result:


Next, let’s combine them all into a user-defined aggregate:


Here’s how you can use the aggregate after it’s created:


One function accumulates partial results by storing the total sum of all lengths and the total number of strings. The finalizing function divides one by the other to return the result. In this case, the result is in the form of rendered text.

The potential here is quite large — user-defined aggregates allow using database queries in a more powerful way; for instance, by gathering complex statistics or transforming whole partitions into different formats.

Enter WebAssembly

To create a user-defined function in WebAssembly, we first need to write or compile a function to Wasm text format. The function body is then simply registered in a CQL statement called create function. That’s it!


Note that the declared language here is xwasm, which stands for “experimental Wasm.” Support for this language is currently still experimental in ScyllaDB.

The current design document is maintained here. You’re welcome to take a look at it: https://github.com/scylladb/scylladb/blob/master/docs/dev/wasm.md

Our Roadmap

ScyllaDB’s WebAssembly support is in active development; here are some of our top goals.

Helper Libraries for Rust and C++

Writing functions directly in WAT format is not trivial because ScyllaDB expects the functions to follow our application binary interface (ABI) specification. To hide these details from developers, we’re in the process of implementing helper libraries for Rust and C++, which seamlessly provide ScyllaDB bindings.

With our helper libraries, writing a user-defined function will be no harder than writing a regular native function in your language of choice.

Rewriting the User-Defined Functions Layer in Rust

We currently rely on Wasmtime’s C++ bindings to expose a Wasm runtime for user-defined functions to run on. These C++ bindings have certain limitations, though. Specifically, they lack support for asynchronous operations, which is present in Wasmtime’s original Rust implementation.

The choice is abundantly clear — let’s rewrite it in Rust! Our precise plan is to move the entire user-defined functions layer to Rust, where we can fully utilize Wasmtime’s potential. With such an implementation, we’ll be able to run user-defined functions asynchronously, with strict latency guarantees.

We’ll only provide a thin compatibility layer between Seastar and Rust’s async model to enable polling Rust futures directly from ScyllaDB. The rough idea for binding Rust futures straight into Seastar is explained here.

We already added Rust support to our build system. The next step is to start rewriting the user-defined functions engine to a native Rust implementation, and then we can compile it right into ScyllaDB.

Keeping Latency Low for User-Defined Functions with WebAssembly

I shared more details about how we integrated WebAssembly and Wasmtime into our project in a latency-friendly manner at the recent P99 CONF, an open source, community-focused conference for engineers who obsess over low latency. The talk, “Keeping Latency Low for User-Defined Functions with WebAssembly,” is available on demand.

Group Created with Sketch.
TNS owner Insight Partners is an investor in: Pragma.
THE NEW STACK UPDATE A newsletter digest of the week’s most important stories & analyses.