Aerospike Database 6: Secondary Index Queries, JSON and More
Aerospike Database 6, our newest database server release, became generally available on April 27 and is full of exciting developer features.
Two years ago, we delivered a vastly improved Cross-Datacenter Replication (XDR) subsystem in Aerospike Database 5. It enabled our customers to create high-performance, geo-distributed applications with fine-grain control over the distribution of their data. Our newest database release builds on these features and reflects our increased focus on queries. Combined with Aerospike Connect for Spark and Aerospike Connect for Presto (Trino), the Aerospike Data Platform enables our customers to serve low-latency transactional and analytical workloads against their large data sets.
We will cover three primary capabilities in this blog post. First, we will discuss the new query capabilities in our 6.0 release. And second, we’ll look at support for document data models through the Document API and our secondary index work, and finally, our completion of batch operations, including batch writes, for greater efficiency through pipelined operations.
Server version 6.0. comes after seven release candidates and is the culmination of 14 months of engineering effort. Aerospike Database 6 is a significant release and includes breaking changes. Please review the release notes and the special upgrade instructions related to the new storage format and new secondary index query capability.
Partitioned Secondary Index Queries
The path to the new query subsystem started in the last two releases of Aerospike Database 5.
Server version 5.6 added set indexes, an optional index type that improves performance for a special kind of query. Using set indexes enables low-latency access to all the records of a small set that lives inside a large namespace. In addition, like the primary index, set indexes support fast restart. Server version 5.7 delivered a 60% reduction in the memory consumed by secondary indexes and a new, highly efficient garbage collection system. Query performance and throughput were also improved.
Aerospike Database 6 builds on these changes with a new architectural approach aligned with the design of the primary index. The data in each Aerospike namespace is evenly distributed across 4096 logical partitions, which in turn are evenly distributed across the cluster nodes. The data for each partition is stored and indexed locally in multiple primary index sub-trees called sprigs, which enable primary index (PI) queries (formerly known as scans) to be massively parallelized.
A PI query can target all the partitions, a set of partitions or a single data partition. Leveraging this capability, the Spark and Presto (Trino) connectors can split a PI query into hundreds and thousands of partitioned queries, feeding data to many thousands of workers in parallel and attacking the job of rapidly processing terabytes of data through horizontal scaling. This approach fits well with the architecture of these analytics systems. The combination creates a next-level distributed computing data platform.
Before version 6.0, developers could only parallelize secondary index queries at the node level. This meant that if a cluster had 40 nodes, the best parallelization the Spark and Presto (Trino) connectors could use was 40 workers. As our customers make production use of Spark clusters with thousands of cores, having most of them sit idle was unacceptable, so these connectors did not implement support for secondary index queries.
In version 6.0, secondary indexes have been re-architected to index each partition separately. This enables massively parallelizing secondary index (SI) queries and supporting pagination, like PI queries. Furthermore, SI queries in version 6.0 are tolerant of rebalancing, unaffected by the automatic data migration that occurs when the cluster size changes. As a result, the Spark and Presto (Trino) connectors will implement SI query support in the same way they currently do PI queries. This opens the door for operators of Aerospike to optionally trade memory for performance improvements. By adding secondary indexes to sets with the right cardinality, SI queries can run at orders of magnitude better speeds than equivalent PI queries.
The change in the secondary index architecture is reflected in the server’s query subsystem, which now unifies both types of queries — primary index and secondary index ones. This change goes deep into a standard execution layer, into metrics that have been merged and renamed, into the client API, which deprecates the Scan class and provides the same rich functionality to PI and SI queries from a single Query class.
Partitioned queries are achieved through client-server coordination and require a new client version, such as Java client 6.0.0, C client 6.0.0, Go client 6.0.0, C# client 5.0.0 or Python client 7.0.0. Applications using the previous release of these clients may run against server 6.0 but will not benefit from the rebalance tolerance. Similarly, the new clients can talk to both server 5.0 x and server 6.0 nodes but will need the cluster upgrade to be completed to unlock the new features.
Upcoming Query Features in Aerospike Database 6
Aerospike has delivered better query performance, a lower memory footprint for indexes, query stability and higher query throughput. Subsequent releases of Aerospike Database 6 will add more functionality and operational improvements to queries.
In Aerospike Database Community Edition (CE), the primary index and secondary indexes are stored in process memory. They must be rebuilt upon restart in a relatively lengthy cold restart.
In Aerospike Database Enterprise Edition (EE), the primary index is kept in shared memory by default, or optionally in persistent memory or on a flash device. This enables an Aerospike EE server to go through a warm (fast) restart, which is significantly faster. Server version 6.1 will add the ability to store secondary indexes in shared memory, allowing warm restarts of the Aerospike daemon (asd) when they are present. Later versions will enable secondary indexes to be stored in persistent memory and even on flash devices.
Currently, secondary indexes can be built over the top-level keys of a Map data structure. This is typically employed to index the top-level fields of JSON documents, which are stored in Aerospike as Maps. Server version 6.1 will add the ability to index elements nested at any depth.
Storing, Indexing and Querying JSON Documents
Since the introduction of Map and List collection data types (CDTs), developers have stored JSON documents in key-ordered Maps and used Aerospike as a document database. Developers use the rich Map and List APIs in multioperation transactions to query and manipulate document data atomically on the server side. Documents (Maps) are stored in a space-efficient MessagePack binary serialization, facilitating fast access.
The Aerospike Document API library, introduced in mid-2021, added the ability to store, modify and query documents using the popular JSONPath query language. The Document API splits these queries into server-side execution based on the native Map API and augmented by a JSONPath library.
The Document API is available as a wrapper to the Java client and as an interface in the Aerospike gateway (also known as the REST client). The Document API library will be ported to other programming languages with an Aerospike client. With the upcoming capability to index deeply nested elements, Aerospike Database 6 enhances the development of applications that use a document model approach. Combined with strong consistency and Aerospike’s ability to scale up to petabytes of data and hundreds of billions of objects, while maintaining sub-millisecond transaction latencies, it results in a document database at scale.
Since the beginning, the client has had support for a simple batch get command to allow multiple records (or bins within them) to be retrieved together based on a list of keys. Similarly, a batch command checks on the existence of multiple keys all at once from a specified list of keys.
Later the client added the ability to execute the same multioperation transaction against a list of keys in parallel, using the batch operate command, but limited the type of operations in the transaction to read-only ones.
With server 6.0, the addition of batch write commands (delete, operate transactions without restrictions on write operations) completes the ability of a developer to batch anything in their application — reads, writes, updates, deletes or user-defined functions (UDFs). Logically related operations can be sent all at once to the database cluster.
Batch writes are more efficient than asynchronously launching a series of commands at the server. Using batch:
- Reduces the round-trip time (RTT) needed to complete all the operations, lowering the overall latency.
- Reduces network traffic, using fewer connections and combining operations into fewer IP packets.
- Improves parallelization, supporting faster data ingest.
Developers of applications used in heavy writes or mixed workloads should consider converting from async writes to batch for better performance and a more stable cluster.
Server 6.0 adds three new granular privileges for role-based access control:
- The index-admin privilege grants a user the ability to add and drop secondary indexes.
- The udf-admin privilege grants users the ability to add and remove UDF modules.
Previously these privileges were only available through the data-admin privilege, which some users were reluctant to grant widely. The truncate privilege is now a standalone privilege and no longer a part of the write privilege. Users representing applications that perform truncates should be granted the truncate privilege to one of their roles.
The breaking changes in server version 6.0 include:
- A storage format change (the addition of a 4-byte end marker to each record) requires that persistent storage devices (with the exception of PMEM) be erased as part of the upgrade. The header (first 8MiB) of raw SSD devices must be zeroized. See SSD Initialization.
- Several configuration parameters have been renamed or removed:
- A small number of configuration parameters have been renamed.
- scan-max-done to query-max-done
- scan-threads-limit to query-threads-limit
- background-scan-max-rps to background-query-max-rps
- single-scan-threads to single-query-threads
- The following query configuration parameters were removed.
- The batch-without-digests configuration parameter was removed.
- A small number of configuration parameters have been renamed.
- The truncate privilege needs to be granted to applications using truncates. It is no longer part of the write privilege.
- The long-deprecated Predicate Filtering (PredExp) was removed. Use Filter Expressions.
- The “scan” module of the jobs: info command has been removed. Use the “query” module instead.
- Be aware that scan- and query-related metrics have changed. We will publish a separate blog to detail these changes.
- The jobs: info command, initially deprecated in server 5.7, is scheduled to be removed after six more months. Use query-show.
- The scan-show info command is now deprecated. Use query-show.
- The scan-abort info command is now deprecated. Use query-abort.
- The scan-abort-all command is now deprecated. Use query-abort-all.