YugaByte: A New Database to Solve the SQL vs. NoSQL Dilemma
Companies spend too much time weighing the whole SQL vs. NoSQL conundrum, rather than focusing on the cloud-native applications they’re building with the mission-critical data these systems are designed to store.
This is the argument, anyway, that the founders of the YugaByte database have put forth. They want to simplify the data layer in the way Docker and Kubernetes have for the application tier. Yuga means Era or Epoch in Hindi, indicating the founders’ long-term view of their offering. The company emerged from stealth last week.
“We’re not really like a SQL database or NoSQL database; we’re really simplifying the data infrastructure mishmash every enterprise has pulled together,” said founder and CEO Kannan Muthukkaruppan.
Nine of the company’s 25-plus employee team came from Facebook during the time it transitioned from MySQL and Memcache to its in-house solution called Tao. They’re committers to the Hbase project, which was also heavily used at Facebook.
Other tech giants have developed their own internal solutions to the problems YugaByte addresses, such as Google with BigTable and Spanner, and Pinterest’s Zen.
Most enterprises still use legacy SQL, which doesn’t provide the agility of the cloud and doesn’t have all the access patterns you’d need to build features in modern applications such as machine learning, time series, and Spark integration, Ranganathan said. This is especially true for applications such as fraud detection, recommendation engines and the Internet of Things — applications require scalable infrastructure that’s resilient to failure, geo-distributed, and portable across clouds.
NoSQL databases tend to specialize in a different access pattern, such as MongoDB’s focus on the document model and Cassandra’s on high-volume. And NoSQL is not fully designed for mission-critical applications because it’s inconsistent and can compromise data integrity.
The “mishmash” he’s referring to usually multiple data centers, a sharded SQL setup in a master-slave configuration in a replicated fashion to keep critical data, often a NoSQL setup for the alternate access patterns — time series, graph, flexible schema document, etc. Then there’s invariably a cache to enable low-latency access, which is also manually sharded and replicated.
The app figures out which pieces of data are absolutely critical and need to go to MySQL, how pieces of data are accessed and which can go to NoSQL, which subset of that data has to go in the cache for low-latency access. This depends on usage patterns, not application architecture because usage patterns change and the data pattern has to evolve. Then you have to figure out how to replicate data between the MySQL master-slave or cache cluster.
All this comes at a huge cost in development and operations to keep this running, and these are not even people working on the application. Because the system is so fragile, it inevitably means inconsistency and data loss, which involves hours of debugging, he explained.
Addressing Pain Points
Traditionally, you would have chosen a different silo for each access pattern — flexible schema, key-value, time series, relational. YugaByte is built on a common core with each of those access patterns on top.
The core is a scale-out data fabric. It’s built on Raft as the consensus protocol, which allows for strongly consistent replication and zero data loss. It uses DocDB, YugaByte’s proprietary document-oriented storage format — a heavily customized form of RocksDB, which provides for low-latency access and high density of data. It runs on popular and known APIs.
“We don’t want to innovate and bring the world the pain of learning one more database API because there’s enough out there. … You’ll be able to take an application written for Cassandra or Redis today and point it at YugaByte, and it would just work. We’re fully compatible with the drivers; you don’t need to change anything,” Ranganathan said. SQL support is in the works.
Under these APIs, because it uses a common data platform, it provides automated sharding and load balancing and handles failovers and replication under the hood.
It runs on any cloud provider and with integrated APIs, people wanting to build applications on top of it can do so quickly.
It addresses the pain points for NoSQL by bringing strong consistency, operational ease, and ACID transactions. To the SQL world, it provides automatic sharding, automatic load balancing, and cloud-native operation. And it offers sync/async replicas, hybrid cloud apps and zero data loss for the multi-datacenter needs.
The Sunnyvale, Calif.-based company was founded in 2016 by Muthukkaruppan, Ranganathan and Mikhail Bautin. It has raised $8 million in Series A funding from Lightspeed Venture Partners. It offers an open-source Community edition and a fully supported Enterprise edition.
It has eight to 10 customers in its early access program and expects to reach general availability by the first four months of 2018.