Confluent, the company created by the team that built Kafka at LinkedIn, has raised $24 million in Series B funding led by Index Ventures, with participation from existing investor Benchmark.
Mike Volpi, partner at Index Ventures, will be joining the Confluent board, which also includes Eric Vishria, partner at Benchmark.
Jay Kreps, now Confluent CEO, Neha Narkhede and Jun Rao left LinkedIn to create the company nine months ago with $6.9 million in investment from Benchmark, LinkedIn and Data Collective.
They hoped to commercialize Kafka in a way similar to Red Hat’s Linux business by providing added support and services. Kafka was open sourced in early 2011 and emerged from the Apache Incubator in October 2012.
Kafka is designed to be a centralized pipeline handling myriad streams of data flowing through a company in near real time, then sending it on to where it should go next. It can be a high-capacity ingestion route for Apache Hadoop or traditional data warehouses, or as a foundation for advanced stream processing using Apache Spark, Storm or Samza.
It collects an array of high-volume information — such as user activity data, logs, application metrics, stock ticker data and device instrumentation — and makes it available as a real-time stream for consumption in systems with very different requirements — from batch systems like Hadoop, to real-time systems that require low-latency access, to stream processing engines that transform the data streams as they arrive, according to the company.
“Kafka serves as kind of the central nervous system for data, but they always build out tools like monitoring tools and connectors to different systems, infrastructure to make sure their data is getting from place to place. So what we do is offer support for Kafka and build out that platform, what we call the Confluent platform, so people don’t have to build all that software in-house,” Kreps said.
Kreps told Gigaom when the company formed that he initially wasn’t sure there would be interest in the technology beyond web companies, but Confluent is positioning Kafka as a way to power Internet-of-Things and sensor-based applications.
It’s designed as a distributed system and can store a high volume of data on commodity hardware from which multiple applications can subscribe to particular streams of data. It doesn’t actually process data, Kreps explains, but manages the inflow of data to other systems.
Microsoft recently unveiled its similar Azure Event Hubs, while Amazon has Kinesis for data ingestion. Meanwhile, companies including Uber, Twitter, Netflix, LinkedIn, Yahoo, Cisco and Goldman Sachs use Kafka.
With the new investment, Confluent will continue building out the platform that it introduced in February, including improving security features and monitoring tools, building out connectors to different systems so users don’t have to do a lot of programming, and helping build out the integration with different stream-processing systems.
As part of an open source framework, Kreps foresees various companies contributing in this area.
“I think what’s exciting about what we’re doing is we’re taking a bunch of data that you would only get as a batch at the end of the day — you would get some kind of CSV dump or log files might get shipped around — and we’re making all of that available as real-time streams. It can be subscribed to by any application or other system, it can be processed in real time, or it can be loaded into other data systems … it opens up new applications and creates a new way to look at data as a continuous stream of things that happen. That’s really how any modern business works …” he said.
The Confluent platform offers Kafka and integration with Hadoop, along with RESTful access for applications, and enterprise-level support.
“Our goal is to really expand that platform to include everything you need to deal with this kind of real-time streaming data,” Kreps said.
Cisco and Red Hat are sponsors of The New Stack.