How Real-Time Databases Reduce Total Cost of Ownership
To meet the requirements of the stringent service-level agreements (SLAs) in today’s businesses, real-time applications must provide predictable performance at any scale, with low latency and at an affordable cost. To do so, they must respond within a tight timeframe and therefore require reliably fast access to all data. The cost, scalability and manageability of a real-time database that manages such data depend on its design. This article discusses specific aspects that are key to efficiency.
Real-Time Application SLAs
A real-time application’s SLA can be defined as the 99th percentile of requests being served within a specified time window and can vary widely for different real-time applications. At the high end of the SLA spectrum are extremely time-sensitive applications like high-frequency trading, where nanoseconds matter in responding to changes in market conditions to exploit arbitrage opportunities. In such cases, techniques like placing the logic and critical data on custom-designed chips and minimizing the signal path to the servers are necessary. At the lower end of the spectrum may be equipment monitoring applications with an SLA of minutes to raise alerts.
The vast majority of user-facing real-time applications reside in the middle realm. These interactive applications must serve a request within human experience timeframes, typically within 1-2 seconds. Upstream drivers of experience delivery, such as personalization, recommendations, ads and so on, have much tighter SLAs. For example, an ad server must respond with a bid within tens of milliseconds. Similarly, fraud detection must be completed in hundreds of milliseconds to maintain a good user experience. These drivers typically need multiple data accesses to process a request, and therefore a millisecond range of data access is required to meet the broader SLA.
User-facing applications increasingly need to handle a large number of users and leverage large amounts of relevant data as well. They may need to scale up to unexpected volumes quickly due to social media or market movements. Therefore, in addition to a millisecond-range data-access latency SLA, they also have stringent requirements for the volume of data and the ability to handle rapid increases in throughput and data.
Pillars of Effective Performance and Scalability
The following factors drive the efficiency and are described in the following sections:
- Data storage: The data storage hardware must support the necessary speed and scale efficiently.
- Database design: The database design must exploit the available resources most efficiently.
- Operational aspects: The system must be easy to scale, resilient and easy to maintain.
Many modern real-time databases achieve data storage in the volatile memory or DRAM to deliver fast data access in a sub-millisecond range. Such solutions can become very expensive to provision and maintain for large volumes of data. Since a single node has a limited amount of DRAM, typically up to terabytes in high-end servers, a multinode cluster solution is required for larger data volumes and meeting reliability goals.
The Aerospike Database, based on hybrid memory architecture (HMA), on the other hand, stores data in solid-state disks (SSDs) as fast high-density storage to deliver a millisecond-range latency over large data volumes. Another key advantage of SSD-based storage is that SSDs provide persistent storage. Data stored in a volatile DRAM must be replicated elsewhere and be repopulated to DRAM during recovery. Repopulating large volumes of data from a persistent or remote replica can lead to system unavailability during a slow restart.
Let’s compare the cost characteristics of data storage with SSDs versus DRAM.
Cost of Large Clusters
Large data volumes of hundreds of terabytes to petabytes will require a proportionately large cluster of hundreds to thousands of nodes. Take, for example, a reserved instance r6g.16xlarge on AWS with 512GB DRAM. Its monthly cost is about $1,500. Assuming 100TB of data, a 200-node cluster costs almost $300,000 per month.
Compare this to the hybrid memory case, where the primary index resides in DRAM and data is kept in SSD. Assuming the index-to-data size ratio of 1:30 (64-byte index entry per 2,000 records), and DRAM of 1 / 30, the size of SSD data for the index would be necessary. In a cloud deployment, consider an AWS instance i4i.8xlarge that provides 256GB DRAM and 7.5TB SSD. The monthly cost of a reserved instance is about $1,300, and a cluster of 13 nodes (100TB cluster / 7.5TB per node) would cost about $170,0090/month, a savings of about 44% over the pure DRAM database.
If a cluster is hosted on premises, the DRAM and SSD prices can be compared to arrive at the hardware cost comparison as memory is a significant portion of the total hardware cost. The prices of SSDs vary by brands, interfaces and other factors, but as of this writing, they are commonly less than $1 per gigabyte, whereas DRAM prices are about $5 and above per GB (source). Thus, the hybrid memory database has a storage cost that is about five times cheaper than a pure DRAM-based database. Assuming the memory cost is about 50% of the total server cost, a hybrid database provides over two times the cost advantage over a pure DRAM-based database.
Overheads in Large Clusters
A pure DRAM-based database results in a large cluster, which has other inefficiencies. With the cluster size, the operational characteristics deteriorate as the overhead of maintaining the cluster increases.
All nodes in a cluster need to communicate constantly with each other with heartbeat messages to quickly establish the state of other nodes and membership within the cluster. The overhead of such cluster protocols grows exponentially with the cluster size, as the inter-node messages have O(N^2) or a second-order dependency on the number of nodes. Thus, the overhead becomes significant in large clusters, and larger clusters become increasingly less resource-efficient as a higher percentage of resources is consumed in the cluster’s housekeeping tasks.
The efficiency of the popular scale-out model is based on commodity hardware that is cheap and easy to replace. This works well with stateless clusters such as web servers that keep the state information in persistent stores like network file systems and databases. The latter must replicate the persistent information to guard against failures. Replication not only increases the hardware cost and update overhead, but stateful instances in the persistent stores are harder to replace as recovery from failures requires reinstatement of data, as do maintenance and scaling events. Most such events would be handled without manual intervention via robust database design and automation of operational procedures. However, some fraction of them will end up requiring manual oversight. The frequency of such events and manual intervention increases with the cluster size, leading to higher maintenance overhead.
Balanced Data Distribution
Suppose the data is randomly and uniformly distributed among the cluster nodes. In that case, the request load is also uniformly spread among all nodes, and the resources on all nodes are evenly utilized. This leads to superior efficiency.
Not only is cluster efficiency optimized, but cost and errors due to manual redistribution of data are also avoided during cluster changes and failures. The latter can lead to unavailability and even data loss. With automatic rebalancing of data, uniform data distribution is always maintained, which assures greater resource efficiency and availability.
Efficient Server Processing
To minimize synchronization conflicts and overhead, request processing should be streamlined at various points. Efficient processing of requests entails many aspects, including:
- The client directly connects to the node where the data resides without having to go through a coordinator node.
- Minimal locking of critical data structures through specific techniques and conventions.
- Streamlining flow with minimal contention across multiple network cards, CPU cores and data partitions.
- Disk storage and network traffic supporting compression of data.
Enhanced Fault Tolerance
While data replication is necessary for fault tolerance and read performance, it also adds to the data and cluster size and updates overhead. A typical system is designed with minimally two replicas to handle one node failure, with the reasonable assumption that two concurrent node failures are extremely rare due to a node recovery time being much smaller than the mean time between node failures (MTBF). The probability of a second node failing during the recovery window of a failed node, however, increases linearly with the cluster size.
The probability of any node failing in a given time window is:
Time duration / Mean time between failure (MTBF)
So the probability of two concurrent node failures in a cluster with N nodes is:
(N-1) x (Mean time to repair (MTTR) / MTBF)
If there are two overlapping node failures, some portion of the data will become unavailable, specifically, the data with both replicas on the failed nodes. To reduce the possibility of data unavailability, databases can dynamically preserve the replication factor (RF) by recreating a new copy of the data on the failed node on other nodes. Such dynamic replication is sometimes referred to as migration. As long as the time it takes to populate a new copy is much smaller than MTTR, this will improve unavailability due to concurrent failures. The cost of data transfer during migrations is justified as long as the amount of data on a node and network speed allow such migration to be fast.
Unnecessary migrations should be avoided when the recovery time is known to be less than the migration time, as can be the case for routine maintenance tasks. In such cases, it should be possible to disable the automatic migration for a node under maintenance for the duration of the maintenance.
Multiple hops from the client to the correct server node with data can be avoided if the client library keeps track of the data placement in the cluster and transparently handles cluster transitions.
Ease of Scaling
If a database does not scale linearly, fast-growing throughput and data needs can become performance, reliability and cost issues, not only because large clusters are costly and difficult to support, but also because the process of scaling can be error-prone without automatic rebalancing, as discussed above.
When the database cannot meet the growing needs of a real-time application, switching to a more scalable alternative can be costly and painful. Such transition involves data migration, application rewrites and architecture/design changes, such as a potential caching layer with its operational complexity and consistency issues. The old system must continue to work while the new system is being brought online, and the switchover must happen seamlessly. Therefore, choosing a system that can scale indefinitely from the start is likely much more efficient and less risky.
Syncing Globally Distributed Data
It is operationally efficient to have a database that allows clusters to be globally distributed and data to be kept in sync either in a tight synchronous fashion or in a near-real-time async fashion, depending on the need.
Automating Operational Tasks
For a complex production system, the higher the degree of automation, the better. The Kubernetes Operator is a sophisticated way to automate operational tasks and can significantly improve operational efficiency and system reliability.
A vast majority of interactive applications require data access in the millisecond range. For applications needing large amounts of data, SSDs provide fast high-density storage at a much lower cost than DRAM. Storing data in SSDs results in a smaller cluster size, which leads to cost benefits, operational simplicity and reliability. A modern real-time database must be designed for the most efficient use of the resources with uniform data distribution, automatic balancing, streamlined processing, single-hop access from the client, and data compression for storage and transfer. Aerospike is designed to provide the lowest total cost of ownership (TCO) through these efficiencies.