Wallaroo Labs Promises Easily-Scalable Big Data Processing Infrastructure
But unlike Spark and Storm, it wants to do so without Java and the JVM.
Founder and CEO Vid Jain launched the company in 2014 after his experience with developing trading algorithms for Merrill Lynch.
“We found that building algorithms was the easiest part,” he said, of the computations that had to handle lots of data and run at huge scale without losing a trade or do it out of order.
“We found that every time we deployed algorithms, we were solving the same infrastructure problems. We spent most of our time on scaling.” He also saw that algorithms were becoming the differentiator for myriad businesses including cybersecurity, advertising and dating.
At the same time, he foresees a future with vastly more data coming in faster and from far more sources, making processing it that much more complex. “Everyone’s already spending too much time mucking around with infrastructure instead of innovation, and we wanted to fix that,” he said.
Built on Pony
Founded in 2014 and previously called Sendence, the New York-based company changed its name to Wallaroo Labs, after its sole product, last fall. Jain said the name change marked the evolution from a consultancy to a product company — and its staff was more willing to wear a Wallaroo T-shirt than one with the name Sendence.
It set out with several high-level goals in mind:
- To allow allowing developers to focus on their business logic rather than distributed computing “plumbing.”
- To provide portable, high-performance and low-latency data processing.
- To manage in-memory state for the application.
- To allow applications to scale as needed, even when live.
In a blog post about building its own Kafka client — the software supports two types of sources and sinks: TCP and Kafka — the company explained Pony provides reliable, low-overhead concurrency with data safety, though it’s not a practical option for use with the JVM due to the overhead involved.
The Wallaroo Labs platform supports Go and Python natively with the Go API and Python API, and provides its own processing engine.
“Say you’re a Python developer. You’d use the Python API to implement the business logic: What is this data? How do I write in this data? What operations do I need to do on the data? Apply some machine learning to it. Put in some result or alert. Developers can write these code snippets that are the business logic. Just that. They don’t have to deal with any scaling issues or plumbing issues or messaging issues or if something crashes how to restart it. All that is taken care of by us,” Jain explained.
That code runs inside the Wallaroo Labs engine, which is spread across however many servers you need.
“It could be on your laptop, on 20 machines with AWS, on five machines on premise, on 50 machines on Google Cloud. It just runs wherever it needs to run and that’s totally transparent to the developer,” he said.
“We wanted to build a framework for the future that combines the best of serverless and the best of the Big Data stack,” Jain said.
Big Data solutions, meanwhile, are harder to use, harder to scale, and they’re mostly all written in Java. If you want to write some Python or Golang code, they’re not really designed for that. But they’re portable — you can run them anywhere — you can run them on-prem, in any cloud.
Wallaroo is easy to scale, lets you use modern languages like Python and Go, and has much better performance than even the Big Data solutions, according to Jain. It can run anywhere and handle complex applications.
It eliminates the performance hit required when developing in Python or Go then translating to Java for production — that whole setup for sending data back and forth becomes complex and difficult to scale, he said.
The actor model approach encapsulates data, minimizes coordination and keeps application state close to the computation.
Managing stateful applications is key to Wallaroo, something most serverless frameworks don’t handle well, Jain said.
“You as an application developer don’t have to think about resiliency or scalability of that state. That’s one of the things our engine and API handle for you,” Jain said.
“StateComputation” is one of the building blocks of a Wallaroo application. Updates are written to an event log that can be replayed in case of failure. Exactly-once message processing is another option to eliminate duplicates.
The company’s closest competitors are the DIY crowd, the biggest chunk, then those pursuing serverless and Java developers building on Spark and Storm. Companies with large Java investments are not its target, Jain said.
“There’s a whole continuum. You have people who have more complex workflows or more high-performance needs than serverless. You have people who have complex workflows or more high-performance needs, but they want to work in Golang or Python,” he said. Wallaroo might be the answer for them.
Feature image via Pixabay.