Video Tutorial: Wrangling Time Series Data with Basho Riak TS
“They always say time changes things,” said Andy Warhol, “but you actually have to change them yourself.” In this spirit, Basho Technologies, Inc., has built a database optimized for fast reads and writes of time series data, Riak TS. Like Riak KV, Riak TS is built on top of Riak Core distributed systems framework.
The New Stack’s Alex Williams recently connected with Robert Genova, Solutions Architect at Basho Technologies, for a demonstration of Riak TS:
Riak TS supports both key/value and time series use cases within the context of a single, unified platform that’s simple to operate and easy to scale, said Genova.
“TS colocates data in ordered ranges, and it supports a data definition language that provides customization and tunability with respect to exactly how that data is partitioned,” he said. “TS also offers an SQL-like query language that supports range queries, projections and filtering, and will ultimately support aggregations.”
As Genova explained, Riak TS facilitates defining a schema with an SQL-like CREATE TABLE statement. Tables can include any number of columns of the usual data types such as integer, floating point, varchar, time stamp, and boolean.
“Every table must include a time-stamp column as well as a primary key definition, and the primary key uses time quantization to allow the user to specify the extent to which ordered ranges of data are partitioned together in a cluster,” said Genova.
“Primary keys are defined by a combination of the quantum function as well as a series ID,” said Genova. “Bucket type is created using the standard Riak admin command that folks that use KV would be familiar with, and the bucket type create statement includes the table definition as one of its properties.”
Demonstrating how to create and store a time series row with the Riak Java client, Genova said, “The columns in each row are required to be in the correct order as well as have the correct types based on the schema that was defined for the table. Time stamps are required to be in milliseconds, since the epoch.”
Genova mentioned that in Riak 1.0, data validation will be server-side but client-side validation will be supported in future versions.
Riak TS facilitates reads through its support for an SQL-like query language, Genova explained. “Queries consist of standard SELECT statements that allow you to specify a time range, a series ID, as well as which subset of your columns that you’d like to return. You can use secondary fields optionally to filter the results set, and the standard set of logical operators apply to that, so operators like equals, not equals, greater than, less than, etc.”
Genova described two examples, the first example being a standard “SELECT from table” query, and the second example showed the selection of a single field and also filtering on a secondary field.
Reads are optimized primarily through the colocation and ordering of the primary data, according to Genova. “This allows the service queries from a tunable number of partitions, which allows the system to avoid the expensive coverage queries that would be necessary if you were to use secondary indexes to perform the same sort of query,” he said.
“We are also filtering data at the level of the storage back end. In the previous example we filtered on the temperature field,” he continued. “Rather than being at the coordinator after subqueries return their full result sets, the filtering of that data will happen at the level of the storage back end, which provides another level of efficiency and minimizes network overhead.”
Genova noted another optimization provided by the query language itself, which is a subset of SQL, offering flexibility and familiarity to the user.
To conclude, Genova stressed the maturity, the reliability and the ease of use of the underlying, highly-scalable architecture, and also referenced the multi-datacenter replication capabilities that are inherent in that architecture. He said that this quality will ultimately be inherited by the TS database as well.
“The way we’ve designed this was to optimize it specifically time series, rather than simply create a more generic, big table implementation,” he said, “and I think that provides another measure of ease of use.”
Basho is a sponsor of The New Stack.