Flux: InfluxData’s New Language for Time-Series Data
So is the new “thing” a purpose-built database for every use case? That would seem to be the case with Amazon’s recent announcement of Timestream, a time-series database released in beta, which along with its blockchain database Quantum Ledger, also in beta, brings its total portfolio of databases to 15.
Times-series databases have been getting particular attention from upstarts in the market such as TimescaleDB, Iguazio, eXtremeDB and FaunaDB. And DB-Engines ranks the monitoring tool Prometheus among the most popular time-series databases.
Making SQL scale has been a major task among these projects, designed to track changes over time. The high data volume requires high-performance writes as well as high-performance reads, Redmonk analyst Rachel Stephens explains in a report on the time-series database market.
She points out that time-series databases tend not to be great at handling high-cardinality data, so users may also need deep debugging or tracing functionality. In addition, adding Kafka or Spark Streaming infrastructure can help users extract extra functionality from ephemeral data.
As a functional language, it allows users to define complex queries through a set of functional transformations on data. It also lets users recompose parts of query functions with user-defined functions, letting them create shortcuts for common functionality.
“It’s horribly inefficient for developers to create the same queries over and over again,” Paul Dix, InfluxData founder and CTO, said in a blog post. “We want common queries and use cases represented and shared so that we stop re-inventing every individual query in InfluxDB. Because we have a common data collector (Telegraf) and a common schema, it should be possible to have reusable queries and functions built by the community.”
More Than a Query Language
In an interview with The New Stack, Dix said Flux, formerly called IFQL, is more than a query language.
“It more closely resembles a scripting language — and that’s the main difference with SQL. It’s a declarative query language. You can have much more complicated behavior within the language allowing you to do much more complicated analytics, data modification, ETL tasks, all this kind of stuff, he said.
It’s actually two different things: The language itself and the query engine.
“We decoupled the language from the engine, so it’s on our roadmap to add other language support. The engine just takes a directed acyclic graph presented as a JSON object, and it knows how to execute that,” he said.
A parser that will parse the query, which looks like a functional-style scripting language that defines how you want to work with the time-series data, then parses that into a DAG, which the engine will execute on any number of Influx servers. The company plans to give it the ability to query other servers paired to the time-series task, so potentially you could pair up some data stored in MongoDB, Postgres or whatever, and pair that up with the time-series data stored within Influx.
Visual interfaces will represent a data flow.
“Under the covers, in the engine, we may not run it exactly as specified because we want to do optimizations, but we don’t want the programmer to have to think about how to structure the query to get the best possible performance. We want them to specify the data flow, then we’ll keep the semantics of that and optimize it to work as fast as possible,” he said.
If you want to do three or four transformations on a set of data, while it’s possible to do it in SQL, it might not be easy and the most readable way to represent it to other programmers.
“… we’re not optimizing it for terseness. We’re optimizing it for readability. All our stuff is written in Go, and one of the things I think is the greatest thing about the Go programming language is that it’s optimized for readability. … As a programmer, you read 100 to 1,000 times more code than you write, so optimizing for readability is actually the best way to make programmers more productive in a team,” Dix said.
Flux’s strength, he said, is on doing transformation such as windowing on a base set of data.
“With the time-series use case, you can get really complicated about how you window the data. If you’re looking at server performance, you can say, ‘Look at the time-series data from when the markets opened to when the markets closed for Monday through Friday, then do computations on it. Or for the past month, Monday-Friday when markets were open and do computations on it. ‘ That kind of windowing, we make a first-class citizen. With SQL, you can do it, I believe, but frequently, these kinds of queries become very, very difficult,” Dix said.
Then known as IFQL, the language was first shipped with open source v. 1.4 a year ago, with a lot of changes based on feedback from the community, Dix said. While part of InfluxDB 1.7 and 2.0, Flux also can be run independently.
InfluxData plans for a model that includes both input and output plug-ins, so in addition to being able to bring in data from other sources, you could send it somewhere else.
“You could have results of a query and instead of returning results to the user, you could send results to a third-party service — to another database or a message queue or you could send it right back into InfluxDB. These are data flow problems, and you see in use cases where people want to monitor something, then based on that send output to some other thing. Output plug-ins will enable functions within the language to call for that.,” Dix said.
Plug-ins for Grafana and Sensu monitoring are among those already available.
TNS Managing Editor Joab Jackson contributed to this article.
InfluxData is a sponsor of The New Stack.
Feature image via Pixabay.