What news from AWS re:Invent last week will have the most impact on you?
Amazon Q, an AI chatbot for explaining how AWS works.
Super-fast S3 Express storage.
New Graviton 4 processor instances.
Emily Freeman leaving AWS.
I don't use AWS, so none of this will affect me.

MongoDB Builds out Its Full Platform Play

At its New York and London events this year, MongoDB announced capabilities that build its full developer platform, and not just the core database.
Nov 1st, 2023 11:48am by
Featued image for: MongoDB Builds out Its Full Platform Play

MongoDB, the company behind the document database of the same name, has been holding a series of one-day road show events, aptly called “.local,” around the world this year. Two of these roadshow conferences, in New York on June 22 and in London on Sept. 26, served as launch vehicles for a set of capabilities covering everything from streaming data to generative AI. Now that the paint has dried a bit on these announcements, I wanted to analyze the lot of them. But first, I’ll set some context.

When it first emerged in 2009, MongoDB was a scrappy and rather revolutionary database. Its challenge to incumbent database platforms seemed naive and quixotic to some, but developers found MongoDB’s use of JSON and the document model appealing and long overdue.

Over the years, MongoDB matured and joined the ranks of enterprise databases, effectively entering the incumbent cohort after having successfully challenged it. With the 2016 launch of managed cloud service MongoDB Atlas, the company began to compete not only with the incumbent database platforms but also with the cloud providers that leveraged them. As a result, MongoDB has thus become enterprise-focused itself, while maintaining its developer appeal.

So what happens now? Will MongoDB fall into resting on its laurels, going into a sort of maintenance mode, merely adding new features to stay competitive and keeping up its developer appeal out of loyalty? It doesn’t appear that way. Sure, with this year’s announcements, there’s continued emphasis on the core platform, a growing focus on Atlas, and an understandable adoption of AI.

But rather than just bolting on new capabilities, MongoDB has figured out how to offer its customers a platform that lets them leverage existing skillsets and take advantage of new technologies as they emerge. To understand how MongoDB is doing this, let’s look at the details of its various announcements now.

AI for the Dev’s Eye

To begin with, like many vendors, MongoDB is embracing AI, both by adding AI-driven natural language processing (NLP) features to the platform’s tooling, as well as adding a major capability to the platform itself, to help developers build AI applications. For example, supplied with natural language phrases, MongoDB Compass can now generate queries and aggregations in its own query language.

Similarly, MongoDB Atlas Charts can create data visualizations from natural language requests, and a new AI chatbot feature can leverage MongoDB’s documentation to provide technical answers to natural language questions. All of these capabilities were announced in London; the documentation chatbot is generally available, while the other natural language capabilities are in preview.

Relational Migrator allows developers to migrate applications that have previously run on relational database platforms. In New York, MongoDB announced the general availability of Relational Migrator, and its ability to migrate relational databases, tables and data.

In London, the company upped the ante, announcing that Migrator can now leverage generative AI to port SQL queries to the MongoDB Query API. The result is that the embedded queries and stored procedures can migrate as well as the data itself. The query conversion capability is in private preview.

Taken together, these features are a boon to developer productivity. On the one hand, the ability to use natural language for queries and visualization means some coding tasks can be automated, either substantially or fully.

For tasks that can’t or shouldn’t be automated, putting an NLP interface onto MongoDB’s technical documentation means that even for more manual coding approaches, AI can still be used to accelerate development and the learning necessary for it. And Relational Migrator’s use of AI not only eases the migration of applications but provides developers with a before-and-after set of database and query assets that helps migrate their skillsets as well.

While these AI features will not impact users of MongoDB applications directly, aiding developers to be more productive on that platform helps users get applications with more capabilities, and get them more quickly. Developers should also be able to iterate on and improve those applications more quickly, which is another win for downstream users.

But it’s also a victory for developers, who should be able to spend a greater percentage of their time on the creative parts of development, and less on the tedium involved in adding new capabilities. That, in turn, should enhance their morale and allow them to learn more about the platform, further accelerating application improvement and quality.

Vector Search Puts AI in Apps

On the core platform side, the changes are arguably even more significant. Specifically, the addition of Atlas Vector Search (as a public preview) to the platform means that developers can put AI capabilities in their applications, in addition to using AI while developing them. Let’s explore why.

A “vector” is a special numeric encoding of unstructured data, typically derived from documents or media content. A vector search capability, aided by special vector indexes, allows for a kind of fuzzy search to compare vectors to each other and thereby find pieces of content that are similar or relevant to one another. That’s what Vector Search brings to the platform, and that’s very useful by itself, but in the world of generative AI, it’s even more so.

Vectors allow a sort of second check on large language models (LLMs — the workhorse of generative AI), through a process called Retrieval-Augmented Generation (RAG).

By storing vector representations of content in a repository, content relevant to a query can be identified. Next, the query and those relevant documents can be sent to an LLM, garnering a more customized and potentially more accurate response from the LLM than if the query were sent alone. This allows for a kind of fine-tuning of the LLM, so it can generate answers that are more contextual to a given organization and its data than could the model alone.

And by grounding the LLM in such contextual data, RAG can also help reduce so-called “hallucinations” (nonsensical or incorrect results). Meanwhile, the underlying LLM can be used without modification, and without fear that the corporate data used to fine-tune it would somehow be shared with the wider world.

The process of finding the relevant documents is where vector search comes in. While MongoDB doesn’t generate vectorized embeddings, it can store them, and can then perform similarity searches across vectorized data stored in the database, based on a vectorized representation of a query. And since existing LLM frameworks can be used to generate the embeddings for both, MongoDB can be leveraged for storing and searching them.

So, to summarize:

  • Content used to fine-tune an LLM can be efficiently stored in MongoDB as vector embeddings.
  • Queries for the LLM can be vectorized as well.
  • Atlas Vector Search can be used to find content in the database that’s relevant to a query, which can be used for RAG fine-tuning of the LLM to yield more accurate responses to the query and filter out hallucinations.
  • And, perhaps most important, MongoDB Atlas can integrate this entire practice into the mainstream enterprise application development process.

Not only is all of this convenient, but it’s efficient, because it avoids the need for a specialized vector database point solution to be added to the dev stack. There are a number of such specialized platforms on the market already, but a few enterprise database players feel that vector capabilities should be a feature set, rather than a standalone product category. Obviously, MongoDB is one such player.

In Search of Search

Arguably, though, MongoDB is better suited to vector storage and search workloads than other databases. Why? Because its platform is based on the document data model. Storing data in JSON form already permits structured and semi-structured data to be comingled in the database. And since vectors are a way of structuring the unstructured, storing vector embeddings adds unstructured data to the mix. Vector search compliments more conventional search to make all of the data — structured, semi-structured and unstructured — searchable, with RAG and LLMs helping deliver a return on that universal search investment.

That’s why Atlas Search is there for conventional/relevance search and Atlas Vector Search is now on-board for AI-focused search.

While MongoDB does not limit its platform and business model to search, Atlas Search as a discrete component shows the company takes search seriously as a dedicated workload. With that in mind, another announcement MongoDB made in New York is the ability to create dedicated Search Nodes.

Search nodes serve as dedicated infrastructure for search use cases that can be scaled independently of nodes servicing more general database services. That’s important for search overall, but especially with the addition of vector search which, for AI-intensive applications, may introduce acute resource requirements that need to be accommodated at scale. And, although Atlas Search is a discrete component, with its own significant engineering investment, it’s part of the same platform around which MongoDB customers have made investments and built skill sets.

Streaming Consciousness

Next up: streaming data. While not AI-related, there are parallels with bringing streaming data processing and LLM queries into the modern software development process. Both disciplines are important to full utilization of data in a business context, yet both are somewhat orthogonal to more mainstream data development in terms of required skills and programming constructs. While specialized platforms such as Apache Kafka exist to ingest, process, and manage streaming data in powerful ways, vendors who can make those technologies accessible to techies with mainstream developer skillsets will be the ones that earn hero status. MongoDB is certainly trying here.

Atlas Stream Processing, the private preview for which was also announced in New York, is MongoDB’s technology for making this integration of streaming data a reality. MongoDB says the technology does this by “unifying how developer teams work with data-in-motion and data at rest.” That makes sense, since working with data at rest is a mainstream developer skill. In other words, if Atlas Stream Processing can make streaming data look like a special case of data at rest, developers get a huge boost in the expertise required to bring streaming data processing into their applications.

Further Connections

And because much of the data that comes in streaming form — be it from IoT devices, sensors, web logs or financial markets — comes with data and time stamp information, Atlas Stream Processing integrates with MongoDB Time Series collections to facilitate analysis and processing of such data at high volumes. Scaling enhancements for Time Series collections were also announced in New York, as was the ability to modify time series data post-ingestion.

In London, MongoDB announced extras in this arena, including Atlas Stream Processing tie-ins with Atlas Vector Search. This integration allows streaming data to be used to fine-tune LLMs, delivering for even more accurate AI solutions than those based on LLMs fine-tuned with data at rest alone. Another goody announced in London is Atlas Vector Search’s integration with managed data streams from Confluent Cloud, the streaming platform from the company founded by Apache Kafka’s creators. MongoDB also announced in London that vector index creation times have been reduced by up to 85%.

With Atlas Stream Processing, MongoDB Time Series collections, and Atlas Vector Search mutually aware and intertwined, the breadth and depth of applications MongoDB can address grows immensely.

It Takes a Platform

What should become clear here is that, even though each of these new capabilities MongoDB announced in New York and London is important by itself, the real power comes from their interoperability on a single platform. Streaming data, time series processing, aggregational analysis, vector processing and search workloads overall are collectively critical for the development of sophisticated modern applications.

And while a so-called best-of-breed approach, which might involve tying together separate operational databases, streaming data, vector databases, time series databases and search platforms, may at first appear to be state-of-the-art, that simply isn’t the case. Combining so many constituent platforms inherently increases complexity, not just of the architecture and technology, but of the developer talent, and the resulting team structure, required to use them in combination. It complicates procurement, maintenance, support and the obsolescence cycle of the technology, too.

To be sure, MongoDB is not the only vendor pursuing the unified platform approach. Arguably, Microsoft has done this for decades, for example. And in the stable of newer vendors, the single platform approach is one taken very seriously by such players as Snowflake and Databricks, as well. MongoDB isn’t a unicorn here, nor should it be.

Instead, it’s acting as a prudent advocate for its customers, aiming to tame the tech landscape’s complexity, creating efficiencies for developers and the companies that employ them. Ultimately, that cuts costs, and it reduces risk. Enterprise software companies internalize that ethos and MongoDB is certainly in the club.

With that developer sensibility in mind, I should note that, in New York, MongoDB announced several pure developer advances too. These include important additions like expanded MongoDB Atlas programming language support to simplify deploying resources on AWS; a new Kotlin driver for building server-side applications on MongoDB; an Atlas Kubernetes Operator that simplifies containerized applications and general availability of the MongoDB-supported PyMongoArrow library for Python, that allows data scientists and other machine learning professionals to work with data in MongoDB.

On the cloud provider side, MongoDB Atlas Online Archive and Atlas Data Federation are now available on Microsoft Azure, and Atlas now integrates with Google Cloud’s Vertex AI LLMs.

Data to Go

The last set of capabilities announced in London, rather than focusing on what MongoDB can do, instead pertain to where MongoDB can do it, in the context of both deployment and development. MongoDB Atlas for the Edge allows instances of MongoDB deployed off-cloud, including at the edge, to federate themselves with Atlas, even in situations where Internet connectivity is intermittent.

This is facilitated by MongoDB Atlas Edge Server nodes, which manage the synchronization of data across devices and the cloud, either in real-time when connectivity is open, or after the fact when connectivity is reestablished after an interruption in coverage. This even permits vector embeddings to be generated and stored while connectivity is intact and allows subsequent generative AI functionality to be leveraged at the edge even when the application is offline, using vector search and the stored embeddings.

Also new is the Atlas CLI (command line interface), which provides a local development option for MongoDB Atlas. The Atlas CLI allows developers to work in laptop-based and other local environments and still interact with both local and cloud-based Atlas database deployments, Atlas Search and Vector Search. The CLI accommodates set up of, and connection to, Atlas environments, as well as automation of common management tasks for them.

It eliminates the need to set up a cloud dev environment for the kind of quick prototyping and testing of applications that might be more appropriately done on laptops or self-managed workstations. So regardless of development or target deployment locations, developers can leverage the platform conveniences and capabilities that Atlas provides.

Not Just Laundry Listing

In the span of a few months, MongoDB announced generative AI features for development, along with vector storage, indexing and search for LLM use in applications. Streaming Data came aboard as did seriously expanded support for edge applications. Dedicated search workloads, as well as enhanced time series and aggregational analysis have been added to the roster, as has multicloud deployment and local development. Compatibility with Kubernetes and multiple programming languages was added for good measure as were integrations with Google Vertex AI and Confluent Cloud.

This series of announcements, made at multiple one-day events, may look like a collection of piecemeal improvements. But MongoDB’s announcements go beyond prolific feature development, into a real transformation, taking what was once a rebel database and developing it into a broad ecosystem for enterprise customers.

Instead of being typecast by its developer appeal, MongoDB has embraced and transcended it, making the platform the fulcrum of its strategy, and not just a gimmick on top of its database. That’s not easy to do, and it raises the pressure for what comes next. But it’s a big achievement, even if it only appears to be a series of incremental improvements. Hopefully, I’ve brought that significant shift out from under the shadows, and hopefully, MongoDB can keep it in the spotlight.

Disclosure: MongoDB is a client of post author Andrew Brust’s company, Blue Badge Insights [].

Group Created with Sketch.
THE NEW STACK UPDATE A newsletter digest of the week’s most important stories & analyses.