Faust, a Python-Based Distributed Stream-Processing Library
“Faust comes with the benefits of Python — it’s just very simple to get started with,” said data infrastructure engineer Vineet Goel, one of its developers. “Robinhood has a relatively small team building a lot of different systems, so the simplicity of getting started and ease of use are things that help you.”
The Menlo Park, Calif.-based company wanted it to be distributed to help with scaling challenges it was facing. It does a lot of batch processing, principal software engineer Ask Solem said, so it wanted something easy to use and in Python, as it is a Python shop.
Solem, the creator of the task queue Celery, has background with large-scale Python projects, and Goel with distributed systems, so with Goel’s research into Apache Kafka, they said they felt confident they could make this happen.
“While existing streaming systems use Python, Faust is the first to take a Python-first approach at streaming, making it easy for almost anyone who works with Python to build streaming architectures,” according to Goel.
Faust takes heavy inspiration from Kafka Streams, yet takes a different approach, notably that it does not use a domain-specific language. As a Python library, it can be dropped into any existing Python code, with support for all the libraries and frameworks Python developers like to use, such as NumPy, PyTorch, Pandas, NLTK, Django, Flask and SQLAlchemy.
Faust supports any type of stream data: bytes, Unicode and serialized structures, but also comes with “Models” that use modern Python syntax to describe how keys and values in streams are serialized.
Its concept of “agents” comes from the actor model, which means the stream processor can execute concurrently on many CPU cores, and on hundreds of machines at the same time, according to its documentation.
It requires Python 3.6 or later to run multiple stream processors in the same process, along with web servers and other network services. It also does not require use of resource managers such as Yarn or Mesos.
The company uses Faust for streaming all the events on the Robinhood app, such as data analysis, risk, fraud and security in real time. It acts as the back end for the Robinhood feed, which is a chat, monitoring execution quality.
“It’s useful for writing back-end services, especially for companies that have iOS apps, Android apps and maybe a web app. They need a back end that does data analysis and can do all that with a single solution,” Solem said.
Jay Kreps, co-creator of Kafka, was among those who tweeted about the project:
7/ The way Faust makes things feel Pythonic is really nice! Lot of room for other interfaces like this. What's key is that all of these interfaces can talk to each other via the same Kafka streams. A stream is a stream whether you made it with SQL, Java, or Python.
— Jay Kreps (@jaykreps) August 3, 2018
Django co-creator Simon Willison was another:
Faust co-creator Ask Solem is the creator of Celery, so this project has some serious pedigree behind it. It currently requires Kafka but it's designed to support different brokers so they're actively interested in Redis Streams. https://t.co/o3XNz51j7C
— Simon Willison (@simonw) July 31, 2018
It was trending as one of the top Python projects on GitHub after the announcement last Tuesday and has already received over 1500 stars, over 50 forks, and three pull requests from contributors outside Robinhood. Faust was number three on Hacker News on the day it launched.