Most streaming data technologies require a mindset from developers distinct from that of working with conventional, relational databases. But now, Deephaven Data Labs, a time series database-focused startup, has released Deephaven Community Core, a free, source-available version of its enterprise product to address this challenge.
The new release facilitates processing and analytics on real-time data, for developers and analysts with conventional database skills. We say this not based on the contents of a press release, but based on a detailed demo of the technology and a look at the code used in the demo.
Deephaven comes from the financial services world, having been developed for in-house use at Walleye Trading, a high-frequency trading firm, originally led by Deephaven’s CEO, Pete Goddard. The technology was developed specifically because databases already on the market didn’t serve Walleye’s high-data-volume, high-frequency requirements.
Deephaven is a 40-person company and is almost five years old. Its platform has been available for some time as a commercial enterprise offering, supporting customers who deploy it at scale, and is fortified with high availability and reliability features. The free Community Core offering, available as a Docker container image, opens up similar capabilities to those comfortable managing infrastructures themselves, or who want to “kick the tires” on Deephaven in development.
How It Works
Deephaven Community Core combines attributes of both streaming and batch processing (i.e. conventional database operations) by embracing an incremental update model. In this model, as new data comes in, in the background, Deephaven identifies it as something to be modified and keeps sophisticated maps of updates. Like its enterprise sibling, Deephaven Community is updated in real-time and can handle high-volume and high-frequency data, on the server or in-process.
The Deephaven IDE (integrated development environment) can provide rich dashboards (as shown in the figure below) for its users to see big data update in real-time. The IDE is provided as a console experience that provides a Python code editor and the ability to execute that code in place.
Deephaven uses Apache Kafka, Apache Arrow and Arrow Flight under the hood. Code written against the platform can subscribe to a Kafka topic and bring back a Deephaven table object, which strongly resembles a Pandas DataFrame. But the table objects aren’t static the way DataFrames would be — they are automatically updated in real-time, as new data streams in. Visualizations and data grids in the IDE that are connected to the table objects will likewise update automatically as the underlying data changes. And no explicit code is required to support any of this.
Deephaven Community also supports so-called derived streams. These are live views created simply by running queries against existing table objects. This way, users do not have to do any explicit connecting or subscribing to Kafka topics; instead, everything is done through a SQL query and an assignment statement.
Deephaven CEO Goddard told The New Stack that “batch and streams are one thing and they can actually work together. The idea of ‘Oh, I want to join them together and create a new derived version…’ A new thing that brings data together and creates another presentation and shows up on a screen, or feeds a downstream enterprise app, or creates an alert that comes to your phone…the ability to create those derived streams is really important and Deephaven is uniquely capable of it… with us, you just literally name a table — boom — now it’s available via the API… anyone that’s hooked up can consume it.”
BI and AI
As with conventional databases, Deephaven can ensure updates are atomic within one stream, source, or across a variety of sources. JDBC and ODBC connectors, which grab snapshots of the data, are offered to connect BI tools to Deephaven. When users refresh their reports, they will then get all the data that streamed in subsequent to the last update.
On the AI front, users can train models based on real-time and historical data. Deephaven Community offers a module called Learn that takes the power of Deephaven and integrates it with the likes of PyTorch and TensorFlow, so that users can use Deephaven and those libraries together. This combination can even support continuous incremental retraining scenarios.
Deephaven and its IDE don’t support all common BI tool features; conversely, today’s BI tools were not designed to work with real-time data. But those features may be added to Deephaven yet, depending on customer demand. For future development, Goddard says, “We’d like the community to direct us on what they want us to do.”
Deephaven’s group of investors is comprised of parties affiliated with Walleye Trading, as well as friends and family, which likely fosters a focus on revenue. That, combined with the fact that Deephaven Community Core’s license prevents offering the platform as a service, puts the company in a position of having rational priorities and prudent protection of its intellectual property.
It’s hard to bring a new database engine to market, but Deephaven has some very demanding customers on its platform and seemingly has the hardening that usually comes with that. If the company can build and develop a real community and ecosystem with this new release, it will be worth consideration by developers, and by the firms that employ them.
The New Stack is a wholly owned subsidiary of Insight Partners, an investor in the following companies mentioned in this article: Docker.