Microsoft’s Cosmos DB Gets Cassandra-Compatible APIs
Microsoft has expanded its Azure-hosted Cosmos DB to serve as a drop-in replacement for Apache Cassandra data store.
The company Microsoft launched Cosmos DB earlier this year as a hosted JSON document database, though the architecture was always designed to support multiple APIs and data models, giving it the ability to mimic the interfaces of other data stores. Now Cassandra is joining the MongoDB, Gremlin, standard SQL, and Azure Tables as database systems Cosmos DB supports natively.
The company announced the update during its Connect() 2017 event, held Wednesday in New York.
The new API support, released in preview today, gives developers “Cassandra-as-a-service powered by Cosmos DB,“ said Rimma Nehme, Microsoft group product manager for Cosmos DB and Azure HDInsight. “You get the familiarity of your favorite Cassandra IDEs and tools, and you can take your existing apps and make them planet-scale by changing the URI and pointing to our platform.”
Support for Cassandra has been one of the top requests from customers, she told The New Stack. Some of them want the simplicity of a managed service, with the security and compliance of a cloud service. “Cassandra developers frequently build large-scale distributed system and that comes with a lot of burden and complexity. By having this API, out of the box, Cassandra developers get a fully managed Cassandra as a service, as a true PaaS service that takes advantage of Azure.”
Others have built their own Cassandra systems that they need to grow beyond the resources of their own data centers — but don’t want to rewrite the application they’ve already created. “Typically, these are the customers who are pained by the scale issues, whether they have large-scale data sets or they particularly need the global distribution,” she said.
One of the big advantages of Cosmos DB is that it offers more than the usual range of consistency models, with intermediate consistency models of bounded staleness, session consistency and consistent prefix, which have clear trade-offs. “It’s a fundamental choice between performance, latency and availability, but also the dollar costs. The stronger the consistency that you have, the more processing you have to run, the more distributed algorithms, it consumes more of your throughput, so you will have to pay more for it,” Nehme said.
More than 93 percent of Cosmos DB customers pick one of the intermediate consistency models, often the less expensive session consistency, she noted.
Cosmos DB offers what Nehme called “wire protocol level compatibility” with existing Cassandra SDKs and tools. As a result, developers may not have to rewrite their apps when switching the backend to Cosmos DB.
This isn’t the using existing open source Cassandra database code; instead, Microsoft has implemented the API and wire protocols as modules on top of the Cosmos DB atom-record-sequence (ARS) data model. In the preview, not all of the Cassandra APIs are included; Nehme said most of them are covered but the support isn’t yet as extensive as the MongoDB coverage.
“The typical patterns of Cassandra usage are actually more of the CRUD [create, read, update and delete] workloads than queries. All of the CRUD functionality is working, the querying capabilities are working. We’ve taken a few canonical tier one workloads on Cassandra and made sure that we’ve got everything covered and we will monitor customer workloads and address the last mile things to take it to general availability.”
The support is good enough that some early customers for the private preview are already running workloads in production. Jet.com is already running what Nehme called “several mission-critical use cases” with Cassandra APIs on Cosmos DB.
Nehme expects some customers to continue using Cassandra in their own data centers, alongside running the same database applications on Cosmos DB, which has happened with MongoDB developers. Sometimes that’s as part of a phased migration to the cloud; “you might not cut the umbilical cord overnight, you might have it running side by side in parallel for some time.” Others will keep the open source NoSQL databases as local development tools.
Multiple Models – At Once
With so many APIs supported now, Cosmos DB gives developers the option of using multiple data models and query APIs on the same data. “If you look at the apps being built in the cloud a pattern we see frequently is polyglot apps with a very fragmented backend,” Nehme said. One database may run the transactional workload, while a key-value store may capture the Internet of Things telemetry data. a document database stores inventory or user profile management and a graph database are used for connections between customers and suppliers and other parties involved in the business.
“By having one database that gives you this multi-model support, that’s applicable to all of those data models seamlessly,” she said.
Even if no code needs to be rewritten, moving to Cosmos DB does mean translating Cassandra settings. To help with migration, Microsoft is working on documentation that maps settings Cassandra developers will be familiar with to the five consistency models available in Cosmos DB. It’s also planning to help all developers choose the right consistency model for their workloads by adding a graph to their database metrics showing how likely your data is to become consistent, using the Probabilistically Bounded Staleness model, which attempts to answer the question of “how eventual is eventual consistency?”
“If you’re building an app using Cosmos DB for web comments or more exploratory workloads, it’s OK if your data is not perfectly consistent; if you’re building your shopping cart, you want to make sure the data is reflected in a strongly consistent manner right away,” Nehme explained.
Beyond Cassandra support, Cosmos DB has been augmented in other ways as well. Cosmos DB’s MongoDB support is also improving with a new public preview of aggregation framework support and unique indices. “Unique index allows you to introduce uniqueness constraints on any document fields on already automatically indexed documents, which is a core capability of Cosmos DB, and aggregation pipeline allows you to run more sophisticated aggregations on top of your data using MongoDB native APIs.”
The Table API is moving from preview to general availability, and the Gremlin API will soon follow, with improvements to performance, import and backup, plus Python client support and more options for using the open source frameworks recommended by Apache Tinkerpop. Nehme also promised ways to simplify migrations from Neo4J and Datastax’s TitanDB. With TitanDB, Cosmos DB could serve as a replacement instead of JanusGraph, a fork of the graph database that took place after its acquisition by Datastax.
The next API to be added to Cosmos DB will likely be Hbase, which fits in with the existing Spark connector. That might even remove some of the artificial divisions between operational and analytics databases introduced when database systems couldn’t cope with running both at the same time.
Microsoft is a sponsor of The New Stack.
Feature image: Microsoft technical Architect Lara Rubbelkey, introducing Cosmos DB improvements, at Microsoft’s Connect 2017 conference in New York.