Neon: Branching in Serverless PostgreSQL
Branching is available with Neon, though it has come about through a lot of hard work, he said and has evolved from an infrastructure feature to a developer workflow tool.
“In the architecture that Postgres has today … branching is just a hard feature to have,” he said. “It takes a next-generation architecture, storage architecture, to enable branching, because the key feature of branching is copy on write. That’s what git has, for example, when you create a branch, you basically moving a few pointers around. And that gives you an isolated, full copy of your data in a separate branch.”
It requires a tight integration of the file system and the database engine.
“The file systems that we have today, they don’t care about what runs on top, right? They don’t know that this is a database running on top of the file system, or is some other application running on top of the file system, and preserving all the transactional semantics when you create a branch, making it undetectable for the systems that currently run in production. And on top of that storage, that’s a really hard thing to do.”
Copy of the Data in a Sandbox
Billed as an open source alternative to AWS Aurora Postgres, Neon separates compute and storage. It completely rewrote storage, making branching possible in its Postgres as a service platform.
Because the API between the lowest level of Postgres and the filesystem was relatively small, Neon intercepts and redirects calls from reading and writing from a local file system to make any RPC calls into its cloud native storage. Its storage layer, which was custom-built for Postgres, redistributes data across a cluster of nodes, offering near-unlimited capacity and savings from moving seldom-used data to low-cost tiers.
While virtually it’s a copy of the data, physically it’s copy-on-write, which doesn’t double the required storage but is used as a method of changing pointers to the data.
“Physically, it’s just a pointer … pointers pointing at the same page. And as the pages get modified, only then do we go and create additional physical pages. So that’s how copy-on-write is built. And because that sits in the storage subsystem, it’s very non-trivial, slash impossible to build it inside Postgres itself. It runs on top of the file system, which Postgres has no influence over,” Shamgunov explained.
Branching gives the user a full copy of the production data, but it’s a sandbox environment where users can experiment without affecting the main branch.
You can create a branch that includes all data up to the current time or an earlier time. Neon retains a seven-day history of branches to a project as write-ahead-log (WAL) records enabling a point-in-time restore feature.
“It’s an incredibly safe way of developing software. It gives you a similar [to git] confidence to kind of mess around with your branch as much as you want because you can always reset it from the production branch, from the main branch. Once the feature is complete, you can roll things forward into the main branch,” he said.
It enables users to:
- Instantly back up the database.
- Run tests in disposable test-specific branches.
- Safely try out automated database migrations on production.
- Run analytics or machine learning workloads in isolation.
Or, if you decide to ditch everything you’ve done, it costs nothing because it’s serverless. Serverless means that developers don’t have to worry about right-sizing their application resources, they just add a connection string to the database. And with consumption-based pricing, Neon can scale down to zero.
The company makes one project free on its cloud service, with up to 10 branches, 3 gigabytes of storage in each branch and a shared computing instance with 1 gigabyte of RAM.
In December, it announced Branch Reset, which enables you to keep your branch updated with the latest schema and data from the main branch. It functions much like
git reset-hard parent in git workflows. It comes with the caveat that it could overwrite some work in your branch.
It also introduced the IP Allow feature for Neon Pro Plan users, adding another layer of security to data. It enables users to restrict access to a branch only to IP addresses that you specify. You create an IP allowlist that is applied to all branches by default or you can apply it only to your project’s primary branch.
Vector Work too
Though more than 35 years old, Postgres remains popular. It’s developers’ most-used database, according to the Stack Overflow 2023 developer survey, which found the Postgres database of choice for 45.5 percent of developers over MySQL at 41 percent. It’s ranked No. 4 at DB-Engines, though, while MySQL is No. 2.
Shamgunov, who previously cofounded the real-time data analytics platform SingleStore (MemSQL), launched Neon in 2022 along with Postgres veterans Heikki Linnakangas and Stas Kelvich. Cloud provider Vercel announced a partnership with Neon in May, which, along with a similar deal with the online integrated development environment Replit, is driving growth for Neon.
It announced a $46 million series B funding round in August, bringing its total funding to $104 million.
The company also is actively involved in the development of the Postgres similarity search extension pgvector. Linnakangas, for one, has made several contributions to the projects to improve performance, Shamgunov said. As a Postgres provider, the company takes the view that there’s no need for a separate vector database.
At the same time, like Postgres edge platform pgEdge, Neon is going beyond what pgvector provides, using an additional set of algorithms with its own vector extension called pg_embedding to help further improve accuracy. It provides vector similarity search using Postgres and the Hierarchical Navigable Small Worlds algorithm to approximate nearest neighbor search.
It also has logical replication in beta.