Cloud Native / Containers / Data / Kubernetes / Machine Learning

Twitter, Yahoo Vets Pool Real-Time Processing Knowledge in Startup Streamlio

17 Aug 2017 9:00am, by

Streamlio, a startup founded by Twitter and Yahoo veterans, has come out of stealth with a platform offering enterprise-grade real-time data processing.

The company takes its name from a combination of “stream” processing, “ML” for machine learning, and “IO” for messaging and storage and incorporates the open source projects Apache Pulsar (incubating), Heron, and Apache BookKeeper.

The Palo Alto, Calif.-based company, now just 14 engineers, plans to build out its business using an oversubscribed Series A round of $7.5 million led by Lightspeed Venture Partners.

A key aspect of their vision is technology that that takes action based on real-time data at enterprise scale without human intervention.

“The world is moving to real time and applications need to become event-driven with this idea of no humans in the loop. … Data is continuously in flight, and applications need to be continuously available and contextual,” said co-founder and CEO Lewis Kaneshiro.

Other co-founders include Karthik Ramasamy and Sanjeev Kulkarniare, co-creators of Heron; Sijie Guo, tech lead of Twitter’s Apache DistributedLog and PMC chair of Apache BookKeeper; and Matteo Merli, tech lead of Yahoo’s Apache Pulsar and PMC member of Apache BookKeeper.

Event-driven Architecture

Last July, Gartner said that event-driven architecture, a design in which a software component takes action after receiving one or more event notifications, is a key technology approach to delivering on the goal of digital transformation.

“Organizations must be able to respond to and take advantage of ‘business moments’ [a transient opportunity] and these real-time requirements are driving CIOs to make their application software more event-driven,” Gartner’s Anne Thomas said at the time.

It describes event-driven architecture as the foundation, stressing that it must support continuous availability, massive scalability, automatic recovery and dynamic extensibility.

The Streamlio team set out to address problems enterprises face with applications such as data loss, complexity, geo-replication and inability to scale.

“Up until now, you see data-driven strategies driving, say, visualizations or even reports. The human will evaluate that, then make a decision. We actually believe we’re entering an exciting era of real-time action, where not only is the data being processed, but there has to be an underlying action that takes place with it,” Kaneshiro said.

One example in financial services would be a real-time extract transform and load (ETL) process feeding internal data lakes for algorithmic trading, or for having that ETL data propagate out into real-time systems.

In a smart cities scenario, real-time stress data on infrastructure, such as from sensors on bridges, and weather data could be used to optimize traffic patterns or automate emergency response services.

Enterprise Grade

Streamlio uses Heron as its compute engine. Twitter built Heron as a replacement to Storm to improve speed and scale for real-time stream processing. At the same time, it designed Heron to be backward compatible with Storm. In a paper outlining the switch, Twitter reported Heron used fewer CPU resources, improved throughput and reduced latency when compared with Storm. Twitter open sourced the technology last year. Twitter has used Heron for all real-time compute for more than three years.

Among the ways Twitter used Heron were for real-time ETL and real-time business intelligence, such as to determine trends using factors such as location and topic, according to Ramasamy, who was engineering manager for the Heron team and now is now Streamlio CTO.

Machine learning lets you continuously model changes in user behavior, for instance, which Twitter used for a number of uses such as for making recommendations and for real-time operations, analyzing data from more than a  million source points, he said.

Messaging system Apache Pulsar was designed and built at Yahoo to deal with problems such as data loss. Yahoo has used it for more than three years for its strong durability guarantees, high-throughput, low-latency, and multi-data-center geo-replication. Pulsar was open sourced in late 2016.

Apache BookKeeper, with contributions merged with Twitter’s DistributedLog, provides stream storage. Twitter and Yahoo have used it for more than four years. It’s designed to support enterprise publish/subscribe, provide performant durability guarantees, and is configurable for storage extending more than a year.

Streamlio runs on containerized technology, with Kubernetes orchestration. However, it’s more than just a managed version of these three open source technologies, according to Kaneshiro.

“What we deliver are modules of machine learning that provide an end-to-end enterprise solution that runs on the Streamlio platform,” he said. It’s delivered on premises, in the cloud, hybrid, and at the edge.

As a preview, you can try out Streamlio Sandbox locally on your laptop.

 

A newsletter digest of the week’s most important stories & analyses.

View / Add Comments

Please stay on topic and be respectful of others. Review our Terms of Use.