Cloud Services / Data Science / Software Development / Sponsored / Contributed

re:Invent Notes: Purpose-Built vs. General-Purpose Databases

14 Dec 2021 6:01am, by

Dominic Wellington
Dominic is a director at MongoDB and works closely with developers and customers to modernize their technology stacks. With over 20 years of experience in the industry, he has worked in numerous different roles including engineering, architecture, sales and marketing.

In the past decade, AWS re:Invent has become the biggest event of the year, at least for anybody working on, or even near, the cloud. This year, it felt even bigger, because we had all missed having an in-person event last year.

This year was also different in another way: It was the first re:Invent without the endless cavalcade of products in Andy Jassy’s keynote, interspersed with the “musical” interludes from the re:Invent House Band. (Jassy was promoted to president and CEO of Amazon earlier this year).

Instead, new AWS CEO Adam Selipsky put his stamp on the event with a very different sort of keynote, focusing much more on storytelling in general and on AWS customer stories in particular.

The lack of blockbuster announcements meant it was difficult to find the theme of re:Invent this year. There were a lot of innovations announced, to be sure, but they were almost all sustaining innovations, new iterations of existing ideas. The Graviton 3 processor looks like it will blow the doors off everything else, but the clue is in the name: It’s the Graviton 2, improved.

As the week continued, I saw one trend begin to emerge as keynote after keynote featured variations on the theme of “one database for each use case” and boasting about AWS having the “widest array of purpose-built databases.” But is this what users actually want?

Architectural Complexity

The first problem is that few use cases actually fit into just one of these narrow special-purpose database engines. Most real-world examples, and even AWS’ own reference architectures, feature several different databases — often connected together by a service named, appropriately enough, AWS Glue. Data is stored in one place for one sort of query, then some or all of the data is copied elsewhere to support a different type of query; and at some point, roll-up queries might be necessary, featuring yet another database engine. Prospective users might be forgiven for throwing up their hands in confusion and despair.

Below is an example of the architecture for a consumer-facing application at a Fortune 500 company. Looking at it, you can quickly grasp the challenges of learning, managing and supporting all those different technologies and having data sitting in all those different silos. Knowing how to create that data, back up that data and synchronize that data, is an operational nightmare.

Example architecture for a typical consumer-facing application from a Fortune 500 customer.

Example architecture for a typical consumer-facing application from a Fortune 500 customer.

Control and Security

The negative consequences of excessive complexity are not limited to higher costs, whether in terms of the AWS resources themselves, the differing licensing models involved or the multifarious skill sets required to understand each component in the architecture diagram, along with its interactions (desired and undesired) with all the other components.

There is also a second-order consequence. When data is copied from one system to another, the operators of the first system lose control of it. There is no way to grant limited rights to a copy of the data set. Once you have your own copy, you can do whatever you want with it, including actions which might be outside the permissions that have been granted by users or by law. Plus, each copy needs to be secured individually, using the security controls that are native to each database engine.

Each duplication multiplies the attack surface that defenders have to worry about, whether against outside attackers or against well-meaning insiders straying outside the boundaries set in policies and regulations.

Developer Fatigue

Developers’ jobs are unique in nature in IT, as they are not asked to produce a strategy. Rather, they’re expected to produce software to execute against strategy set by executives. Once a use case spans multiple database engines, it becomes that much harder for an individual developer to understand each one to the same level.

Developers who might previously have had at least a working knowledge of the whole use case now find themselves shut out of many areas — unless they invest their valuable time in acquiring the skills required to work with yet another specialist data store. As RedMonk’s Stephen O’Grady noted recently, “While no one wants to return to a world where the only realistic option is relational storage, the overhead today of having to learn and interact with multiple databases has become more burden than boon.”

This difficulty widens the gap between developers and the line-of-business users of the application into an echoing chasm. Such specialization also creates a minimum threshold below which new requirements are not viable. It takes a certain amount of expected value for it to be worthwhile standing up a whole new database and figuring out how to interface it with existing systems. Below that level, development of the requested feature is either not viable or it is restricted to generic constructs available in other systems. The result is dissatisfied end users who have once again been denied a feature they require for their own work.

Adam Selipsky and AWS evidently see an extensive catalog of database engines as a positive, with an option in there specialized to fit every need. The space between that level and the level actual users work at can, in this view, be filled by path-finding stories. I am not so confident that this is the case.

Photo by Brett Sayles from Pexels.