Data / Serverless / Sponsored / Contributed

How to Ensure Your Serverless Database Stays Serverless

24 Feb 2021 11:00am, by

Andrew C. Oliver
Andrew is a freelance writer and software developer with a long history in open source, database and cloud computing. He founded Apache POI and served on the board of the Open Source Initiative. He also helped with marketing in startups including JBoss, Lucidworks and Couchbase.

Going serverless is easy. Compared to DevOps, Kubernetes and everything else one goes through in standard cloud architecture, it is like taking a warm bath. This is not to say there are “no ops,” because any compute functionality must be maintained, monitored and deployed. One may want to control what is on or off with feature flags, for example. Other issues arise, but nothing like provisioning servers and maintaining low-level infrastructure — certainly nothing like the personal hell that is Kubernetes configuration.

But serverless applications still struggle with state. If the database architecture has not changed to match the serverless application, then the concerns of managing the database and how the database represents the world inevitably leak into the application code. Once that happens, the serverless application has become more of a chimera — half serverless, half serverful, challenging to look at, and all pain. This article outlines what to keep in mind as one moves away from the more traditional RDBMS into a serverless approach to overcome these state struggles.

The First Rule of Serverless Databases

The first rule of serverless is that there are no servers. A serverless application must not worry about connecting to an actual database server. In fact, serverless frameworks like AWS’s Lambda make dealing with connections or connection pools quite painful. Serverless applications should interact through APIs and events. This interaction is the only way to keep things…well…serverless.

Data is different from the rest of a serverless application or infrastructure. Stateless services can move around the cloud with little concern. State still exists somewhere on a disk — even if that disk is a chip now. That state might exist as copies, but something needs to worry about making sure those redundant copies and related data are consistent. A serverless application should not have to think about these issues. The infrastructure should guarantee consistently available persistence, as long as the application follows convention.

Go Functional for Integrity

Serverless convention is structuring the application as functions that manipulate object graphs. So going serverless is also about going functional. However, most databases have query and data manipulation languages that are, in essence, imperative programming models.

For instance, SQL might be declarative, but you have to string together multiple SQL statements in imperative code to do anything complicated. This imperative and usually remote code introduces opportunities for broken consistency in the underlying data.

One may think the database guarantees consistency, but developers often misunderstand isolation level guarantees and the temporal issue of when data becomes enrolled in a transaction. On a client-server application with relatively few users, one benefits from the improbability that these edge cases will happen; but on a large scale distributed application, the probability flips.

For traditional relational databases like PostgreSQL, the answer is usually to create a stored procedure (not that this solves every issue). However, deploying and maintaining stored procedures is somewhat unnatural for a serverless application and is not very “low ops.” It is also a “weird” language by default. However, PostgreSQL also supports Python; and there is an external module that supports JavaScript. These database implementations of Python or Javascript feel totally different than coding a normal Python or NodeJS application. Also, most samples or documentation are in the default language.

Stored procedures also introduce application logic into the database and database axioms into the application logic — in other words, a leak of concerns. Moreover, cursors and refcursors also have efficiency problems. Transaction control in stored procedures with cursors can also be complicated and produce unexpected data integrity problems.

Serverless databases need to execute functions as complete transactional operations. In essence, a developer should be able to send a function from the application layer to the database and have it perform that as a discrete operation. The function should look, taste and smell like regular functional application code, with the difference being in manipulating stateful objects. A serverless database should exceed the real-world transactional integrity of even a client-server RDBMS, by moving manipulation into functions and sending those to the database.

Serverless/Database Mismatch

The difference between the application model and how the database understands data is an opportunity for consistency errors and costs in terms of performance and scale. Modern object-relational-mapping tools can hide the problem, but they introduce both complexity and require data transformation. When every operation requires multiple layers of transformation, that adds up to a high cost.

In the cloud, you pay those costs in fractional cents per operation — but they add up. The goal is not to have developers painstakingly optimize the cost of operations, but to use tools and APIs that fit into and follow best practices.

The underlying infrastructure should understand availability zones, geography, ACID guarantees, and all of the stuff that goes into managing data consistently in the cloud. There should be a way to organize the data as functional applications with collections of entities — but that should closely match how my application organizes the data and not introduce a mismatch or new paradigm. It goes without saying that it should autoscale and fit the consumption model of the serverless infrastructure.

Deployment Semantics

Frequently, database conventions get in the way of service and microservice deployment. Consider the case of deploying a service dependent on an underlying datastore. In a traditional SQL database, this may involve deploying not just the service but SQL DDL and DML, to alter the table structures and mutate the underlying data.

The stored procedures have to be deployed to the database through the same method. So the service has to be taken down, the data manipulated, the procedures altered or deployed, and then the service is restored. Any dependent services are also down during this outage. This kind of database deployment requires complete service unavailability. It cannot be done in one availability zone at a time, for instance. Regardless of the inconvenience or uptime issues, that just does not “taste” serverless. While a serverless or NoSQL database does not entirely avoid data transformation issues, they mitigate them — especially if the microservice contains discrete functions for data operations.

Finally, multitenancy capabilities and a flexible document structure offer other opportunities to avoid or at least make the outage more specific (e.g., only for specific customers). Modern development and service ethos are that an outage should not be required for deployment and, where necessary, should be for the smallest number of users.

What to Look for in a Serverless Database

A serverless data architecture should look like a serverless API. It should be functional, including the query and data manipulation language. The database should guarantee consistency inside of a function. The database model should look like an object graph — as similar to the application’s object graph as possible. The consumption model should match the rest of the services in the overall application or system.

However, a list of capabilities does not quite capture the issue of what a serverless database should be. There is a certain “serverless flavor,” or a je ne sais quoi, that just feels serverless. It is not only the operations but also the setup and the way the code works. It is really “everything” that goes into working with it. It feels naturally like the rest of the serverless infrastructure and architecture. It just “feels serverless.”

Lead image via Pixabay.

A newsletter digest of the week’s most important stories & analyses.