MongoDB’s Return to Its Data Management Roots
When MongoDB Inc. started in New York 10 years ago, the initial plan was to create a full platform for managing data, though over time it narrowed its focus to a NoSQL database. Now it’s going back to that original vision.
“It’s like we took an eight-year break,” Eliot Horowitz, chief technology officer and co-founder, said in an interview.
Dev Ittycheria, president and CEO, outlined the company’s achievements in the opening keynote at the MongoDB World 2017 in Chicago, where the company’s leadership also laid out how they envision that initial idea unfolding, both near and longer-term.
The company announced its backend-as-a-service Stitch, designed to handle routine development tasks, and that Atlas, its database-as-a-service, is supported on Azure and Google Cloud as well as AWS.
It also demonstrated Charts, a visualization feature coming in version 3.6, that won’t require flattening JSON data to make it look relational. Charts will allow developers to explore their data directly from a Web-based interface connected to the document store.
Horowitz outlined other features coming in 3.6, due out in November, including:
The BI Connector, updated to 2.0 in January, will be added to Ops Manager later this year. This feature lets you use MongoDB as a data source for SQL-based business intelligence and analytics platforms. Recently released Tableau 10.3 includes a button to connect directly to MongoDB.
It’s also giving users the ability to do more kinds of joins for subqueries and a visual way to build aggregation pipelines.
It’s going beyond the joins called $lookup introduced in version 3.2.
Using a catalog as an example with products including a T-shirt and a battery, Horowitz explained:
“Say we want to display a web page with an order with every line item and show the average rating for each. Right now you’d have to do a query for every line item, computing the average rating. It’s totally possible, but you’re going to make n+1 calls to the database. In 3.6, you can do it a single database call. We’ve extended the dollar lookup operator and you can create sub-pipeline lookups so you can do any kind of join you want, any kind of subquery you want in the aggregation pipeline as a single query.”
This case involves joining the orders collection with the reviews collection and for each line item, computing the average rating and putting all that back in the original document.
The second change involves updating arrays using the Update Operator:
“Say you’ve messed up an order, and you want to give this user a 20 percent discount on every item they order. In 3.2 and 3.4, you’d have to do an update for every line item. Or you’d have to call up the document and modify it and put it back. Wouldn’t it be nice if you could do that with a single query? So in 3.6 you can. Here we’re giving multiplying the price of every line item in the order by 0.8, giving a 20 percent discount. What if we only want to update some elements? What if we only want to update only the items we haven’t shipped yet? We could do that as well.”
It also provides the ability to easily make updates no matter how deeply nested your documents are, he said.
The document validation feature will add JSON Schema support that will tighten control for collections that must be more consistently structured. It lets you say things like “This document has to look exactly like this with no deviations” or “This document has to have these fields, but it can have other fields as well,” he explained.
And in case of network error or other disruptions, with “retryable writes” the drivers and the server coordinate so that every operation has an ID, and the driver can safely retry any operation, guaranteeing that it happens exactly once.
In a move to prevent users from accidentally exposing data on the internet — the company has taken heat for what some have called “loose default configurations” leading to exposure — in 3.6, you’ll only be able to connect from a local host. You will have to add a flag, saying, “Open this up to the internet.”
It also supports a new change streams API that lets you pull data out in real time as changes happen to the database. You can subscribe to changes or a subset of changes to a collection. You can look at a subset of data on a sharded cluster.
Beta programs for 3.6 are to begin within a few months, though the company hasn’t announced exactly when.
Horowitz told Datanami that in the longer term the company is looking at things like making a parallelized query execution engine and a column store, which major analytical relational databases use to speed up analytics. A column store’s a year or two away, he said.
Feature image via Pixabay.