Data

Databases at Scale Part One: The Real-Time, In-Memory Game

13 Jan 2015 10:36am, by

This three-part series looks at current advances in database scalability for real-time transactions. Part One focuses on the new generation of database services that are emerging to provide real-time transactional processing at scale. Part Two reviews industry benchmarks that show how transactional processing at scale continues to gain power at high volume levels. Part Three will focus on what steps businesses can take to manage early scalability while building a future infrastructure that can manage the growth trajectory.

 

Industries are increasingly turning to hybrid transactional/analytical processing databases that can manage data in memory. These databases are part of the architecture stack that is powering the rise of enterprise apps and enabling the growth of hyperlocal mobile products — in which personal preferences, a customer’s current location, external factors like weather and time, and a business’s supply chain all interact in real-time in order to enable a purchase or provide a service.

Now, businesses are finding as they scale, they are coming to rely on the capacity for their database architecture to keep datasets in sync and to enable micro-second processing that does not disrupt the end-user experience, but that can carry out behind-the-scenes analytics to ensure that risks are assessed and interpreted in real time.

Recent weeks have seen a new competitive drive amongst databases keen to demonstrate their ability to meet the market’s growing demand for low latency, high volume transactional database capabilities.

Basho

Adam Wray, CEO of Basho — makers of the RiakCS and Riak Enterprise — sees the main database competition as being amongst themselves, MongoDB, Cassandra and Couchbase. “These are the only four companies that have scale,” Wray says: pointing to the tens of millions in revenue, the size of venture funding, and the types of customers using their scalable database architecture.

“If you think about the overall marketplace, we have three main drivers of unstructured data,” says Wray, explaining the huge growth in distributed database architecture providers. “Big data — which has now become the purview of mid-tier as well as Fortune 500; distributed workflows of the geographical variety; and the Internet of Things with machine-to-machine communications. All three of these factors feed our business.”

At the enterprise production level, Wray believes the reason Basho is seeing 88 percent annual growth rates, and attracting $25 million in Series G funding this week, is because large enterprise clients are picking one use case at production level, and once they are beginning to see the simplicity and power of Riak’s distributed database architecture, are increasing the number of scenarios where they can make use of unstructured, transactional data at scale. This was the case for the UK’s National Health Service, for example, which began using Riak for managing profile and prescription systems. Now the database platform manages data on 80 million UK health care patients and keeps their episodes of care and treatment histories in sync as they move between over 20,000 NHS affiliated health care services.

“The types of clients are more than just the traditional gaming or media type companies but are government and health and medical. Clients have traditionally been looking at this, and in the last 18 months they have been getting serious about production support. We think there is an opportunity: our clients are getting serious about the production level. We are at the core of the ‘land and expand’ model. Our contracts are going from an average of one to two years, from a couple of hundred thousand to a couple of million in average contract size. It is all about latency, availability and performance,” says Wray.

Redis Labs Releases New Sharding Technology

One of the most popular large scale databases to manage application data is the open source Redis. “Redis is a key value database, and at the same time it is often considered a data-structured engine,” said Ofer Bengal, CEO of Redis Labs, a commercial provider built on the Redis open source foundations.

“In terms of early adopters of Redis, companies such as Twitter and Stackoverflow have been using Redis for quite awhile: for example, Twitter has several data centers each with hundreds of terabytes of data in Redis.

“When you look at the vertical markets for Redis, they come from cloud, finance, gaming, advertising, online travel, e-commerce, and in terms of use cases, we see Redis used in job scheduling, bookings management, and transaction processing.”

Redis Labs currently offers a database-as-a-service product, with a freemium mode for up to 25 GB of data, then metered every hour based on customer usage. They are currently working on their second product: “enterprise software that can be downloaded and stored on any server, or public cloud, so the customer operates the database. This is now in beta-testing,” says Bengal, who expects a release date this month.

Chargify, the Bleacher Report, HotelTonight and Docker have all adopted the paid Redis Labs DBaaS to manage aspects of their data architecture.

Key to how Redis Labs enables high volume data transactions is the creation of a sharding technology. For their DBaaS customers, Redis Labs have recently released new features that help increase transactional processing to a rate of around 1.2 million operations per second.

Bengal says that in the past, sharding did not give database administrators and application developers enough control over how a database is split. The new features allow users to use regular expressions to define the database shards.

The priority for Redis Labs in introducing sharding technology has been the need to respond to increasing customer requests. Enterprises are increasingly turning to in-memory database services like Redis to manage large scale databases they need to manage millions of transactions per day from around the world.

The basic scaling dilemma for all databases is that once a database gets larger than a single machine, “then any cross-machine operation becomes a problem,” explains Bengal.

NuoDB: Distributed Database Architecture to Enable Hybrid Transactional Processing

NuoDB is a database software built for deployment on multiple machines to create a distributed database architecture, whether running on-premise, or on private or public cloud servers. They have announced new features as part of the NuoDB Swifts Release 2.1 to enable “low-latency Hybrid Transaction/Analytical Processing (HTAP) capabilities”.

CTO of NuoDB, Seth Proctor, is seeing a growth in clients needing greater transactional capabilities from the in-memory databases. “The keys for our customers are fast answers to what’s happening live in a system and not being required to maintain multiple replicated copies of the data in separate services to gain insights,” Proctor says is the key driver.

Proctor says the challenges come into play when working with massive datasets that cannot be contained in a single machine: “It’s exactly that distributed architecture that lets NuoDB address the traditional challenge of handling volume and mixed workloads without increasing latency. As you need to scale the number of operational transactions or increase the complexity of the analysis, you can add more in-memory peers to take on the work. These in-memory peers will form caches based on the types of workloads so resources are managed effectively.”

NuoDB’s solution offers an alternative, however, to sharding a database:

“Sharding, by definition, means that you’re splitting a single database into multiple, disjointed ones. As a result, you limit the scope of any single transaction, lose global consistency and have disjointed services to manage, and storage points to maintain and backup. In the face of failure, it’s very hard to understand the overall state of the system. You definitely cannot run transactionally consistent analysis on the entire data set.

“With NuoDB, you scale-out while still maintaining a single, logical database. There is no limit to what a transaction can do, and no requirements on developers to encode operations assumptions in the application logic. So when you need to change the deployment model, it’s a simple operations decision. Managing durable state is much simpler as are the failure models. Because of these factors, development time is shorter, costs are lower, provisioning and management is easier and it’s possible to take existing applications and migrate them with few or no changes.”

In Memory Database Architecture: A Key Growth Sector

Database architecture that can handle real time transactions at a global pace are still in their infancy and only just introducing the sorts of features that enable them to be able to be reliable across hybrid configurations of on-premise and in the cloud.

As the Internet of Things expands, the API economy balloons, and more businesses adopt a mobile-first approach to designing applications, the need for data architecture that can manage transactional speed while maintaining data integrity is set to become a business norm.

Part Two of The New Stack’s overview of new database platforms at scale looks at some of the latest industry benchmarks showing transactional processing speed at high volume, with low latency.

Basho is a sponsor of The New Stack.

Feature image via Flickr Creative Commons.

A newsletter digest of the week’s most important stories & analyses.

View / Add Comments

Please stay on topic and be respectful of others. Review our Terms of Use.