MarkLogic Expands from XML Roots to Embrace JSON
MarkLogic is a prominent NoSQL database vendor, focusing on enterprise-level application workloads. Large-scale companies often run mission-critical applications with customer data that can change by the minute. Within the enterprise, MarkLogic’s NoSQL database management systems focus on managing different data models: those that can not be easily encapsulated within the traditional relational data model.
As such, the MarkLogic’s database systems are schema agnostic. Rather than making development define data in specific rows, columns, or tables, MarkLogic allows customers to use multiple or custom schemas, along with semantics to better model data, knowledge graphs, geospatial queries, and more. MongoDB and Couchbase also share this “custom-schema” approach, as do many NoSQL database options on the market today. MarkLogic’s semantics, based on XML, offer a data language called “Triples” in RDF (Resource Description Framework), with the standard query language SPARQL.
Although the company started in the time of when XML was cresting the peak of its hype curve, MarkLogic has since shifted from XQuery to JSON as the primary form of interaction, after finding that its customers faced pain points such as having to program their app interfaces in XQuery.
“The JSON part gets MarkLogic out of the usually-losing side of the XML/JSON debate,” wrote Curt Monash, head of database management systems analyst firm Monash Research.
Trusting Your Health to NoSQL
Unlike most NoSQL databases, MarkLogic provides for transactional processing, complying with ACID Transactional properties.
ACID stands for Atomicity, Consistency, Isolation, and Durability. Atomicity is an all-or-nothing approach to transaction operations occurring within a database. As such, if a database processes a transaction it is either completed or not processed at all. This ensures that no transaction is processed only in part, meaning less data corruption can occur when running at scale.
Consistency ensures that data follows the rules defined in the database. If a transaction doesn’t fall within the rules set or returns invalid results, the database is rolled back to a previous state which follows them. Isolation is crucial for those enterprise level database use cases within the banking sector. When withdrawals are made, each transaction occurs independently without interference – while the order of transactions is not enforced. Durability helps to ensure that if a system failure occurs before a transaction is completed, the database will revert to its previous state.
One of the most high-profile use cases of MarkLogic has been with the U.S. Healthcare.gov site, a national health insurance enrollment portal, which was criticized for sluggish performance when launched. There were large spikes in load as users tried to sign up for policies at the last minute, with over 50,000 users active on the Healthcare.gov portal at peak times.
The trials and tribulations of the Healthcare.gov launch are well documented. The contractor’s developers were largely unfamiliar with the MarkLogic technology and/or XQuery, though the database system appeared not to be the culprit for the sluggishness. In fact, MarkLogic was key to making the system work as planned.
Each state in Healthcare.gov possessed a different schema for collecting customer data, while the MarkLogic database also had to ingest data from insurance providers about the plans they had available on the health care marketplace.
“The database also had to integrate with disparate federal organizations, the IRS, immigration – all with different data formats, to bring together eligibility calculations, plans, shopping, and handle the user experience when people purchased a plan,” said Joe Pasqua, MarkLogic.
A standard relational database could not collect data at the scale needed to run the workload the Healthcare.gov site would be getting. So, the Healthcare.gov technical team started the path toward adopting NoSQL as the database architecture for Healthcare.gov, though after several years of work they were unsuccessful.
MarkLogic was brought onto Healthcare.gov because it had the flexibility to deal with data formats from different states, agencies, and insurance companies. Thanks to the heavy volume of data storage and use levels, Healthcare.gov required enterprise level scale-out architecture. MarkLogic adopted the project with a scale-out cluster, using a shared-nothing architecture for the database. During times of peak load, more nodes would be added, which would then redistribute data while the system was running. As loads come in they can be equally distributed. When a load passes, it can be shrunk down – which allows for elasticity when and where it is needed.
An unforeseen challenge to MarkLogic during the launch of Healthcare.gov was the actual physical infrastructure present, noted Pasqua. There were frequent hardware outages, network outages, or rolling outages where storage would go down. This was a concern for MarkLogic’s counterparts at Healthcare.gov, as storage outages meant that customers may miss their open enrollment period, or data losses could result in costly errors for insurance companies if plans were lost or user data was corrupted in an outage. Ensuring that customer data would persist in the event of continued rolling outages was something that MarkLogic addressed with its rapid recovery tools such as point-in-time backup recovery, and incremental backups.
Though MarkLogic has faced trials, it has built a solid database with features that differentiate it from other NoSQL offerings that are on the market for today’s enterprise-level customers. With more companies seeking to understand and act on the data collected on their users, the opportunity to build upon NoSQL database features in an effort toward streamlining data collection and analysis at scale continues to grow.
Feature Image: “Catbent” by Tegelpoca is licensed under CC BY-SA 2.0.