Data / Contributed

Distributed SQL Takes Databases to the Next Level

15 Nov 2021 11:32am, by
a pictorial of human evolution
Rob Hedgpeth
Rob Hedgpeth is the Director of Developer Relations for MariaDB Corporation. He has been writing code since the early 2000's. Like many others, he started his journey by building pretty horrendous looking websites. Fortunately, for the world, he has since evolved and has branched out to a variety of projects across web, desktop, mobile, and IoT. Throughout the years he has contributed to the architecture and development of many apps using a large array of languages and technologies. Now as a developer evangelist for MariaDB Corporation, Rob gets to combine his love for technology with his mission to fuel developers' curiosity and passion.

Across three decades, the evolution of databases has been a crucial part of technological innovation for every industry and for businesses of every size around the world. In the late 1990s and into the early 2000s, databases hit a performance wall. Hard. The success of the internet, in combination with the explosion of application development, led to massive volumes of data and a velocity of data expansion never seen before. Ingesting that data at the time led to major database bottlenecks. Increased latency and decreased throughput were common problems that mucked up the gears of business productivity, ultimately slowing revenue growth. Those problems never really went away.

At the heart of this great database quandary has been the issue of elasticity — being able to scale database capacity up or down on demand — while maintaining high performance and, ideally, low costs. Ops teams and database engineers have sought to tackle scaling challenges by evolving multiple approaches, many of which still exist across enterprises. Only by learning from the disadvantages of those approaches have we arrived at a solution that doesn’t force trade-offs: distributed SQL.

Earlier Approaches: Costly, Brittle, Limited

The first efforts to achieve some measure of scalability were not-so-complex workarounds. Under pressure, teams decided to throw more resources — both hardware and developers — at the problem.
Pretty quickly, it became apparent that this was an expensive and rigid path. While capacity could be added over time and often via laborious acquisition processes, the approach did not allow for dynamic scaling up and down depending on workload requirements. Solutions were often proprietary and not cloud-friendly.

Eventually, database administrators, systems architects, and application developers piled into proverbial whiteboarding rooms to solve the problem outside of the database— using code. The concept of sharding or partitioning, where a single logical dataset can be split and stored in multiple databases to increase total storage capacity and handle additional requests, eventually made its way into different database systems. But sharding has proved to be a complex and brittle approach that’s difficult to maintain, levies a high engineering cost, allows limited transactions, and carries query restrictions. The search for true elasticity continued.

Then, about ten years ago, came a movement to behold: the NoSQL boom where frustrated innovators offered an alternative to relational databases. Technically, this non-tabular approach has been around since the late 1960s, but was not directed at business use cases in any widespread way. The NoSQL approach of the early 21st century allowed the database industry to take a step back and completely rethink and revamp storing and accessing data in an effort to solve scalability problems.

While the approach did eliminate some database and data ingestion problems, there was a price. NoSQL meant giving up a lot of the key things that made relational databases so useful — like maintaining and keeping track of data integrity and being able to leverage ACID transactions. The primitive querying, complex data modeling, subpar data integrity, limited or non-existent ACID transaction support, and specialized tools required for a NoSQL approach made NoSQL a no-go for most enterprises and small to midsize businesses.

The Relational Revolution: Distributed SQL

While NoSQL has marked a genuine innovation in database options, distributed SQL is the revolution realized — offering the best of SQL and NoSQL, of relational and non-relational database advantages, especially for high-performance scaling. Distributed SQL is a single logical database made up of multiple database instances, or nodes.

The idea is that, with this cluster of nodes, developers can simply add or remove nodes on demand in order to accommodate changes in storage and access needs. As the name suggests, the data itself is distributed, split up into partitioned slices and replicated across multiple nodes. Distributed SQL implements shared-nothing architecture, where a single node in a cluster satisfies each update request, eliminating contention among nodes as they independently access the same memory or storage.

Distributed SQL is a relational database win-win. The technology’s innovations are based on lessons learned over the past thirty or so years to deliver true dynamic elasticity. The modern benefits of dynamic elasticity include the ability to add or remove nodes simply, quickly, and on-demand. The approach is self-managing, able to automatically rebalance nodes or rebalance data within those nodes while maintaining extremely high continuous availability (i.e., automatic failover). And of course, the approach includes all of the features that make relational databases so powerful, like the ability to use standard SQL (including JOINs) and to maintain ACID compliance.

A distributed SQL option like MariaDB’s Xpand is architected for all nodes to work together to form a single logical and distributed database that all applications can point to, regardless of the intended use case. Whether a business needs a three-node cluster for modest workloads or hundreds, even thousands, of nodes for unlimited scalability, distributed SQL means deployments can grow or shrink on demand. A system could have the combination of thousands of cores, terabytes of memory, or even petabytes of storage that operate on a single logical database. And that database is capable of handling millions or tens of millions of queries per second, without sacrificing data integrity or continuous high availability.

A lot of things are deemed revolutionary or a game-changer these days, but distributed SQL actually deserves those monikers. It’s already helping businesses leverage true elasticity on commodity hardware, on-premises and in the cloud in a cost-effective way. Taking advantage of the possibilities that massive data yields by harnessing elasticity is the future of business. And that can only happen by adopting the future of databases — distributed SQL.

Feature Image par Gerd Altmann de Pixabay.