AWS Advances Relational Technologies with Open Source and Vector Database Support
Amazon Web Services renewed its commitment to the open source community by recently revealing it’s the first diamond sponsor of the MariaDB Foundation. The MariaDB Foundation is the international point of contact for MariaDB Server, which DB-Engines recently ranked as the third most popular open source database.
AWS’s sponsorship (which includes investing both technical and capital resources) of MariaDB is merely its latest effort to further the open source database movement. The hyperscale cloud services provider has made similar investments in other open source engines, including PostgreSQL, MySQL, and Redis.
It will continue along this trajectory for some time to come.
“We’ve announced extended support for a number of open source databases recently,” acknowledged David Nalley, AWS director of open source strategy and marketing. “We’ve found that our customers have been having a lot of pain points around upgrading. They have to do a lot of testing, particularly if they’re in a regulated industry. There are a number of audits they also have to go through every time they upgrade. You’re going to see a steady drumbeat [of open source contributions], especially in the walk up to re:Invent.”
As Nalley implied, AWS’s assistance to open source databases is designed to provide a more enterprise-ready experience for its managed database service customers. Specific feature and maintenance work pertains to horizontal issues of improving the scalability of these databases.
However, AWS’s contributions are also calculated to make them more progressive, extend the utility of relational technologies in what’s been deemed the age of Artificial Intelligence, and support contemporary (and future) forms of storing and retrieving data — including facilitating vector databases with relational technologies.
Amazon Relational Database Service
According to Nalley, AWS has been the second or third-highest contributor to MariaDB for the past year. Specifically, the cloud provider has delivered new features and ongoing maintenance, much of which has been devised to render the database more scalable. As one of the top three public cloud providers, AWS has a helpful vantage point with which to evaluate the scalability of MariaDB and other open source options running in Amazon Relational Database Service (RDS). “One of the things AWS gets to bring to the table when we’re having conversations around building software is, simply, our scale gives us a lot of opportunities to see things break at that scale,” Nalley said.
Conversely, it also presents opportunities to see what’s required to prevent them from breaking in the future. RDS was conceived of as a means of providing what Nalley called “guardrails” for operating open source databases and outsourcing production complexities to AWS. “We’re providing you with resources and the frameworks to know that your backups are good, and to know you’re running something in a secure way so that you don’t have to worry about that operational overhead,” Nalley commented. “After that, folks are really interested in things like replication, so that they can do very high, concurrent reads across a number of different database endpoints.”
A Relational Vector Database
They’re also interested in vectorizing content, making it searchable, and applying it to numerous deployments of Generative AI. That’s just what some of AWS’s leading contributions to PostgreSQL, another open source database supported by RDS, enable with the pgvector extension. Although AWS isn’t the sole contributor responsible for this extension, it was a significant one, which is critical for several reasons.
Firstly, it belies the fact that open source relational databases are solely confined to the realm of transactions. Also, it extends the shelf life of relational technologies by making them relevant — if not desirable — for Generative AI applications. Most of all, it makes vector search and applications of Generative AI much more accessible by bringing them into established tools like PostgreSQL. “Folks are storing a lot of data and weights inside PostgreSQL and have found that PostgreSQL, plus this pgvector extension, makes it a really compelling vector database, instead of going to a purpose-built,” Nalley revealed.
Third Wave of Databases
The ramifications of this development are nontrivial, particularly as they stand for open source databases and for the notion of databases in general. Although at present PostgreSQL is the only relational option Nalley was familiar with that had an extension for storing vectors, “We’re going to see a reevaluation of a lot of the way we store data because of the way various Gen AI and other AI models are consuming data,” Nalley promised. Such differences are best understood according to the consumption models of the most impactful database types.
- Relational databases: According to Nalley, relational databases typically rely on a key (that’s either primary or foreign, depending upon the relation to other tables), which drives the way their data are interrelated. Subsequently, “You’re saying you’ve got a structure; you’re going to define this structure and grab data, push it into this structure, and then query it because we know what it’s [the structure’s] going to be,” Nalley explained.
- NoSQL Databases: Although there are different paradigms for NoSQL databases, the document-based variety (which Nalley characterized as “more generic” NoSQL databases) “say, ‘we’re not necessarily going to follow that structure’,” Nalley remarked. “We want it to be more freeform and we’ll figure out how to query that a little more holistically.”
- Vector Databases: One of the hallmarks of such document stores is they’re more flexible for ingesting different types of data without needing to normalize those data to an existent relational structure. Vector databases extend upon this notion. “Instead of there being the idea of a primary and foreign keys, we’ve got the concept of we got all this data, and we’re needing to make inferences based on the collection of data as a whole, and figure out what to send as responses,” Nalley said. “It’s another step away from what you consider relational traditionally, but still allows you to use a relational database underlayment.”
An Ongoing Commitment
Perhaps the ultimate demonstration of AWS’s commitment to advancements in the open source community is that its interest in the subject isn’t limited to relational options. The provider’s support for Redis, a key-value store in the NoSQL variety of databases, underscores this fact. AWS has one of five maintainers (a decision-making authority on an open source project who sets its future technical direction) and two committers (someone who can push source code directly into the repository) on what Nalley called “the Redis project”. This fact, when combined with its continuing efforts on MariaDB, PostgreSQL, and RDS, emphasizes the company’s resolve to make these options more suitable for the enterprise.