Concord Leverages Mesos for High Performance Stream Processing
Twitter, which created Storm, recently open sourced its real-time stream processing engine Heron. And the startup Confluent just released a real-time analytics module for Kafka aimed at reducing the time it takes to implement a stream-processing cluster and making management easier.
Concord, an event-based distributed stream-processing framework built on top of Apache Mesos, also is jumping in, announcing a public beta. It touts its differentiators as dynamic topology, multi-language support and high performance.
“Storm wasn’t designed to handle the flexible nature of the jobs we wanted to run,” — Shinji Kim
“What we’re seeing more and more as far as streaming applications is that streaming processing jobs are becoming more like microservices or applications that require a certain amount of agility, cluster management, monitoring and the ability to debug these services, and that’s quite different from MapReduce or ETL jobs that run in the background as a report,” said co-founder and CEO Shinji Kim.
Based in New York City, the three-person team – Kim, Alexander Gallego and Robert Blafford – formed the company about a year and a half ago. It raised an undisclosed amount of seed funding from Bloomberg Beta and several angel investors including the co-founders and executives of Mesosphere.
While open source stream processing systems like Apache Storm and Spark Streaming focus on running a Hadoop MapReduce functions in real-time, Concord runs real-time applications as asynchronous services.
“We started the company because we had a lot of trouble scaling Apache Storm from our previous company where we were running large-scale stream processing for an ad-exchange network,” Kim explained. “We were scaling Storm from running hundreds of millions of data points every day to tens of billions of data points a day. Our infrastructure needed a lot of work every single day.”
The company was using Storm for multiple purposes, including serving ads, fraud detection and more.
“Storm wasn’t designed to handle the flexible nature of the jobs we wanted to run,” Kim said.
“We looked at similar solutions like Samza, Spark and others, but we weren’t able to make it very stable operationally and yet flexible in terms of making changes as you go. We’ve talked to a lot of other companies in similar stages where they’re processing a lot of data – in gaming as well as other ad networks – but they were building their own systems on top of message queues, their own schedulers. It was very hard for them to extend it after they built it for very specific purposes. So we thought there should be a better stream processor out there.”
She says the team got a lot of inspiration from Google MillWheel.
Concord’s core is built in C++, which provides predictable performance and scalability, Kim says, while other stream processors are JVM-based. Concord’s software supports Python, Scala, Ruby, Java C++ and Go natively on top of Mesos.
It also supports dynamic topology, allowing users to make a change or scale out part of the topology without a hard re-start.
“In Concord, because we treat each operator as its own Mesos task or containerized environment, topology management can be dynamic, meaning at runtime, you can change your topology, you can change code, you can scale parts of the job as it needs to be and not have to worry about downtime of the whole system,” she said. “This allows different teams to make changes without worrying about having cluster-wide failure or affecting other jobs. This makes iteration time faster for real-time applications that utilize the stream.”
The system also is highly performant. It claims 10-times faster throughput than Storm or Spark Streaming with latency is in the milliseconds. The company plans a blog post in the coming weeks detailing how it benchmarked those numbers, Kim said.
How it works
Users can create an operator in a high-level logical layer, leaving Concord to take care of distributing work in the physical layer. That logic can be written on top of Kafka, processed, then output to HDFS, MySQL or others.
In the physical layer, the Concord cluster uses Apache Zookeeper to store metadata needed in the event of failure; Mesos Master, which provisions and tracks the status of each agent and works together with the Concord Scheduler to scale jobs up and down; each operator is deployed in a Linux container operated by a Mesos Agent.
The Concord Executor within the Mesos Agent contains the operator, a tracing engine and a router, which determines where to send the result.
When Matt Asay, writing at TechRepublic, pressed Kim about when Concord is suboptimal to other options, she said, “Currently, Concord does ‘at-most-once’ processing, and it will lose its local state/cache if there’s a node or operator failure.” However, she added that it’s working on integrating with Kafka to support “at-least-once.”
She told Asay Concord would be best used “when you prefer performance over having the perfect result,” such as running a real-time bidding model on a programmatic ad exchange.
Mesosphere is a sponsor of The New Stack.