Data / Development / Storage

Programming in 2039: How Persistent Memory Will Change Databases

8 Feb 2019 1:34pm, by

Kyle J. Davis, Technical Marketing Manager, Redis Labs
Kyle J. Davis is the technical marketing manager at Redis Labs. Kyle is an enthusiastic full-stack developer and works frequently with Node.js and Redis, documented in his long-running blog series on the pair. Previously, Kyle worked in higher education, helping to develop tools and technologies for academics and administrators.

I started university in 1999. That year, I learned SQL. I remember imagining my little application on a server and that one line of SQL triggered a chain of amazing events. The query language issued commands to a disk controller which moved an arm across the disk. The head was able to pick up the data previously written to the magnetic media. The data pulsed across a wire back through to the controller through the OS and back to my software. This all happened in mere seconds.

That was about 20 years ago. A student this year will have a very different experience — everything is different. The micromechanical aspects of the spinning media replaced by SSDs. SSDs are solid state; they don’t have motors or actuator arms, just silent flash memory. Digging in a little deeper, however, they still emulate the mechanical bits of the spinning disk. Databases and file systems are still designed for the world of spinning disks — most database software is specially designed to provide persistence within the mechanical limitations of the moving media world. And it is pretty antiquated today.

Now, fast forward, to 2039, twenty years in the future. I’m sure the things we do today will seem as silly as dial-up. I am not a futurist, I’m a database guy. I think about data, how we store and retrieve that data.

With persistent memory technology becoming a reality today, applications are being freed from the constraints applied from physical media. Things begin to get blurry as our conception of what a database does needs to shift. Persistent memory operates more like RAM than anything else. Also the concept of files become less important as the file system (another relic of the spinning disk era) isn’t always a must for power-off persistent data.

With these thoughts in mind, databases, without the burden of spinning media, are a little different. Here is a high-level shortlist for the in-memory future:

  • Clustering — Persistent memory will (at least initially) be more expensive than SSDs. Therefore, for even moderately sized data, it will still need to have one dataset that spans many machines. This needs to happen on the smallest number of machines that can safely provide data durability.
  • Optimization of the protocol and network — When you eliminate entire classes of bottlenecks from a system, things like the network become very evident. A protocol that has a very low overhead and persistent connections between client and server that can be asynchronously accessible ensures that the advantages of in-memory data storage are not lost.
  • High availability — While high availability is often needed even in a disk-based system, the higher throughput of in-memory systems means that even short outages can mean billions of requests not served.

Additionally, the architecture of the software written in 2039 will be very different. Right now we have very rigid lines between services that provide data in different ways. You might have a single database that serves relational queries. Today we can build applications that don’t always need relational data, instead relying on established NoSQL concepts. Yet this is only done when performance premiums are required, often defaulting to some relational database to provide persistence and rich data access. When you can provide persistent memory and a way of operating on a single piece of data in different models, then the need for the traditional relational database is relegated to some very specific uses.

Data storage fundamentals shift with hardware

In years past, the relational model was extremely successful. You can reason about many problems and fit them into normalized tables that could be manipulated queried. This worked great, but if you had a simpler problem to solve, say, to get an item via a primary key, much of the same complexity had to be summoned — queries, tables, schemas, etc. NoSQL and, more specifically, key/value stores made this approach seem ridiculous.

As we move into the future, the fundamental concept of data being stored will shift from being a particle of ferromagnetic material flipping polarity to an unbelievably microscopic layer of silicon directly addressable and that can be quickly manipulated and read. Because the hardware is changing, so should the way we use it.

Indeed, when thinking about other data models, similar patterns can be seen. With time series data it’s quite simple, needing only a lightweight ingestion alongside a minimal schema, however time series data in relational databases carries the baggage. Graph data is especially poorly suited to being implemented on top of a relational model, functionally breaking any type of inner table interoperability to achieve ad hoc relationships between graph nodes.

Out of this frustration rose the variety of special purpose databases of the NoSQL world. Each providing very good access in its own way. However, this had its own challenge. Each database had to be administered by someone and had different characteristics when it came to scaling, monitoring, and protection. In addition, these databases couldn’t speak to each other in meaningful ways, placing a burden on the applications that depended on these systems.

As we move into the future, the fundamental concept of data being stored will shift from being a particle of ferromagnetic material flipping polarity to an unbelievably microscopic layer of silicon directly addressable and that can be quickly manipulated and read. Because the hardware is changing, so should the way we use it.

As the database approaches the CPU addressability of DRAM and yet the data is retained through power-down situations, it becomes clear, our applications in 20 years will consider data as close and fast, more akin to a variable inside a program rather than a distant database.

Data comes in as a model that is most convenient and performant and then, the database itself can be aware of this data and atomically manipulate that data and perform operations on that data, as well. The data changes models and can either replace the original form or coexist with it. The new data can be retrieved instantly by the application as it is needed, rather than having to perform gymnastics over a single relational model. No longer is there worry about how to scale a specialized database, data instead is being manipulated at the most basic level. Still, you will have the traditional problems to solve of cluster, protocol optimization and high availability, but the locality of processing and malleability of the data inside the database layer has eliminated a class of problems.

In 2039, I don’t know if we’ll be using jetpacks. However, I’m pretty sure that we won’t be using databases or writing applications in a way that would be familiar to the 1999 version of myself.

Feature image via Pixabay.

A newsletter digest of the week’s most important stories & analyses.