Why Your Next Database Will Be a Serverless API
The early days of serverless computing left a sour taste for many developers and engineers. Initially, developers saw a great opportunity for Functions-as-a-Service (FaaS). They could write code in a microservices style and no longer had to think of scaling or server configuration. Additionally, there was the promise of an additional advantage that is sometimes viewed as the core component of serverless: “pay-as-you-go.”
Yet many first experiences with serverless did not live up to the hype. Some companies had success with select custom applications, while other applications appeared to be unsuited for serverless. Those not suited bumped into the provider-imposed limits put in place to estimate and cope with the rapidly changing demand of functions. War stories began to emerge from engineering teams that started with high hopes but ended in frustration because of these limits, just as the project was nearly finished. Reported problems included cold starts, payload limits, runtime limits and memory limits.
Yet, as each of these problems was solved, serverless has proven over time that it is here to stay and promises to be one of the key trends of 2020. Many engineers now understand the prior limitations and can analyze more carefully whether a project is suitable for serverless. However, since serverless is still evolving, the range of applications that can be built serverless-style is rapidly expanding. Many limitations have been addressed and new providers are arising that focus on different things. The traditional providers such as AWS have reduced their cold-starts and new providers like Cloudflare, EdgeEngine, and fly.io are emerging that offer “edge” serverless which brings another advantage to the table: the automatic distribution of functions across the globe which results in lower latencies than could have been dreamed of before.
However, FaaS is only one of the many faces of serverless; the core ideas (pay-as-you-go and zero-operations) that have driven adoption are being incorporated in many other domains. Where we used to pay for a service monthly or buy a license and install a deployable on our own servers, we can now see a wide range of services offer a pay-as-you-go model such as authentication, analytics, content management, real-time notification, payment services, or databases. In essence, they provide highly specific computing as a service.
The Role of the Database?
From the above list of service categories that are adopting these core principles, serverless databases will probably have the most impact. Mostly because the role they are playing in an application stack requires them to be always accessible and as real-time as possible. Additionally, startup applications can no longer afford not to scale when the moment of fame comes; it’s either hit or a miss. Distributed databases with a solid failover strategy fit the bill perfectly, yet the distributed aspect often makes them complex to deploy and maintain. Serverless databases have seen this struggle and therefore offer to take operations out of your hands. In essence, serverless databases are databases that do not require provisioning or server management and offer a pay-as-you-go API.
The developer community is waiting for databases to evolve to a point where they can work in perfect harmony with a serverless stack.
It’s very likely that the evolution of distributed databases towards serverless will be a catalyst for the adoption of serverless. Or rather, this evolution would lift the final hurdles to adopt serverless since an application built entirely on functions would not provide the desired pay-as-you-go or scalability if the database remains the bottleneck.
In fact, many problems related to FaaS can be traced back to the database. Traditional databases do not offer modern secured APIs, which forces you to shield the database from outside access. This is often done by placing the database in a Virtual Private Cloud (VPC) where you can hide away the database behind the backend. However, cold starts can suffer significantly when lambdas are operated in a VPC.
Another reason is that databases typically require a connection before interaction. Setting up a connection typically requires time and the amount of connections is limited. Since serverless functions should be completely stateless, connections can’t be kept open indefinitely. The alternative, where each function establishes a connection, retrieves data and kills it again, will add seconds to API calls. When many functions run concurrently, we can easily surpass the maximum number of database connections that can be open at the same time.
A relatively easy solution to this problem would be to place an API on top of your database that handles connection pooling for you, but it would add complexity, latency, and depending on the approach can introduce a single point of failure again.
What Has Changed?
Databases have many complex optimization and data consistency problems to tackle. Therefore, they have been evolving slower than the rest of the elements in the serverless stack. Articles like this Serverless Database Wish List, show that the developer community is waiting for databases to evolve to a point where they can work in perfect harmony with a serverless stack. A database that works in harmony does not suffer from the aforementioned connection problems, because it is connectionless and is secure by default. Instead of opening connections to the database, the database becomes an API that ingests security tokens and delivers data. Ideally, such a database offers the same advantages as the rest of the stack such as pay-as-you-go and auto-scaling. With the emergence of providers that push serverless to the edge, a database that is also geographically distributed would safeguard the low-latency advantages of these providers. Nowadays, databases that offer parts of these features (or all of them) exist. Examples include FaunaDB, Azure Cosmos, Firestore, and DynamoDB.
New approaches like JAMstack are emerging where the idea is to eliminate the backend (as we know it) as much as possible. Such a stack tries to achieve the lowest possible latencies by deploying statically generated pages (compiled during deploy) on CDNs and augmenting them with dynamic elements (client-side) that receive their data from serverless APIs. This is rapidly becoming a popular approach since it requires no server configuration, is cheaper to host, and embraces the auto-scaling and pay-as-you-go benefits of CDNs and serverless.
More and more SaaS companies have seen this evolution and are filling the gap in the market by providing service APIs that take care of the tedious parts (e.g. authorization), or the parts that are difficult to achieve (e.g. long-running calculations) in such a stack. JAMstack developers can nowadays choose Auth0 for authentication, Cloudinary for media storage and manipulations, Algolio for searching, and databases like FaunaDB to store and query dynamic data. The JAMstack community’s growth is accelerated by these services; whenever a new need arises from the community, a new lego block appears in the form of a new service API that can effortlessly be plugged into their existing JAMstack.
Many things have changed since the first releases of serverless. We have more experience, limits are being lifted, and the ecosystem is growing rapidly. For a long time, databases were not foreseen to be used in a serverless stack and therefore remained a bottleneck due to their security model and persistent connections. We are now approaching a tipping point as new serverless-friendly databases lift these limitations. Additionally, edge serverless and JAMstack enthusiasts are pushing latencies down even more by adopting globally replicated delivery (CDN) and distributed computing frameworks (Edge serverless). A few years ago, this would have made less sense for OLTP applications since no good solutions existed to store consistent data in a distributed manner without an enormous amount of operational overhead. Today, globally replicated databases such as Amazon Aurora, FaunaDB, Spanner, and CockroachDB are offered as a service and a few of those (FaunaDB, Spanner) even provide ACID guarantees on a global scale.
Serverless-friendly databases provide the last piece of the serverless puzzle and globally replicated databases are the final frontiers to create a stack that is serverless and delivers, computes, and stores data at the edge.