Yaron Haviv, Chief Technology Officer and co-founder of Iguazio joins us for this episode of The New Stack Makers podcast. Haviv’s deep technological experience in the fields of big data, cloud, storage and networking led to him leading the team who created Iguazio’s new product nuclio.
This open-source project, launched in December 2017, is an extremely fast serverless platform with a real-time processing engine. Extremely fast meaning 100x faster than AWS Lambda. “Everyone knows that the value of data diminishes over time,” said Haviv, “so that means when the data starts flowing into the system, you need to start aggregating contextualizing and acting on it as it flows.”
The question was how to process data at extreme speed and at scale, moving from the notion that data is something that you store and query to thinking of data as something you continuously process.
Under the hood, it is an abstraction layer built on Kubernetes, which supports a large variety of native event sources. It’s used for building complex, orchestrated microservices that can be invoked transparently across multi-cloud environments. Or, Haviv said, “It makes sure that you can contextualize a lot of data in real time and drive decisions on top of that data.”
Nuclio started because the team wanted to simplify moving code into production.
“Developing code is easy, but operationalizing code is much harder,” he said. For operationalizing code, you need debug points, to put in observation points, think about logging and auto-scaling. Next, there’s the load balancer in the front or if you are streaming you need to think about sharding. Then there’s version control, rolling upgrades, etc.
“When we wrote code, we wanted to push a button and get rolling upgrades, central logging central observation, so we don’t have to duplicate the process over and over again every time we create new functionality on top of the platform,” he said. “Then our customers asked, ‘Why are you guys spoiled?’ They wanted the same functionality.”
Havel said the team looked to open source to make nuclio as modern as possible. They adopted code from the community, contributed to other open source projects, found best practices, and came up with three major innovations of their own.
The first innovation that makes Iguazio so fast, explained Haviv, is they engineered the data using Flash, but have it perform like an in-memory database. That makes it thirty times cheaper and thirty times more dense. And, he said, it allows the user to do a lot more interesting things.
The second innovation is creating one database engine instead of having their own API. They essentially took all the open source implementations and Amazon implementations of APIs and piled them on top of the stack as microservices, abstracting them away from the users.
That also allows users to stream time-series data through one API and read the exact same data through another API. So, for example, these customers could stream sensor data into the platform and at the same time run a SQL query against all the sensor data that is collected.
The third innovation was creating all this as what Haviv calls a “Data PaaS” or Data Platform-as-a-Service (PaaS), based on Kubernetes as a platform, with everything containerized and engineered to auto-scale. It’s very flexible with programmable APIs and, is cloud-native with all the bells and whistles.
For example, for API gateways, instead of reinventing the wheel, they just grant their customers access. So for Envoy, Istio, the functionality is built into there but the customer doesn’t have to set it up.
So what are customers doing with all that speed? nuclio allows a ride-sharing company in Asia to see all the information for all the mobile devices for all their users and can, in real time, build aggregations around maps, volume, demand, and pricing for decision analysis (think surge pricing).
Another example is detecting fraud in real time. Basing decisions on a wide variety of information coming from the application itself, stock prices, maybe even a Twitter feed, customers can immediately detect any fraud in their system, based on parameters they select.
“You have to mesh many different data points on top of one vector, and on top of that vector, you run an algorithm and make a decision,” said Haviv.
Listen in to find out how an open source server can be simpler than AWS Lambda, nuclio’s relationship to Open Source, and how the company got its name.
In this Edition:
1:37: What pain points or problems does nuclio solve?
4:02: The difference between data mining and data lakes.
7:08: How businesses are using this platform to make their companies better.
10:26: The things developers need to consider when they’re trying to decide if serverless is for them.
14:50: What pieces of the stack already need to be in place in order for nuclio to work?
16:41: Exploring Haviv’s work on creating industry standards with the Cloud Native Computing Foundation
The Cloud Native Computing Foundation is a sponsor of The New Stack.
Feature image via Pixabay.