Flyte: An Open Source Orchestrator for ML/AI Workflows
Does data for artificial intelligence and machine learning need their own workflows and orchestration system? It does, according to Union.ai, which offers an open source solution called Flyte that provides workflow and orchestration to fit the unique demands of data, not software.
“The number one feedback we get from people who use orchestrators for machine learning is that they’re not made for AI workflows, machine learning workflows, because you’re forced to write YAML code, you’re forced to do understand Docker files,” Martin Stein, chief marketing officer and head of developer relations at Union.ai, told The New Stack. “You’re forced to really do things that machine learning engineers, data scientists and researchers don’t do.”
Basically, with Flyte, developers write their code and then run it locally or remotely, he added.
“Our thesis is that software is fundamentally different from machine learning, although they are related,” said Niels Bantilan, chief machine learning (ML) engineer at Union.ai, said. “The main difference between software and machine learning, in our opinion, is that software is stateless. … On the other hand, data and models change all the time.”
Flyte as an Orchestrator for Machine Learning
“What is orchestration?” Bantilan asked rhetorically. “To make a new music analogy, the conductor in an orchestra will be the central point of coordination that tells each section each instrument when to play, how to play and what dynamics and essentially, it’s a software orchestrator. A workflow orchestrator is quite similar at that level of abstraction where it’s coordinating when certain computations are done, where certain data is being pulled from, where it’s being pushed to, and essentially coordinating this whole system to achieve some desired behavior.”
Flyte, he contended, solves these problems with rich tooling.
It is Union’s position that a good data and artificial intelligence orchestrator provides:
- Management and security: RBAC, data ownership, multitenancy, and scheduling
- Monitoring and visualization: Data lineage, data visualization, workflow visualization, task-level observer ability
- Performance and accuracy: Strongly typed interface, GPU Acceleration; parallelism; signaling
- Workflow efficiency: Intra-task checkpointing; recovering from failures; rerun a single task; Caching; spot/preemptible instances; timeout; dynamic resource allocation; notifications;
- Flexibility: Intra-task checkpointing; versioning; dependency isolation; multicloud support.
Not coincidentally — since Union came up with the list — Flyte addresses each of those bullet points.
“Most orchestrators don’t do what Flyte does,” explained Stein. “Flyte is one of the very few orchestrators that actually go beyond what data-only orchestrators like Airflow do. For example, Airflow doesn’t have caching, doesn’t have intra-task checkpointing, is really not built for ML pipelines. It’s built for data pipelines.”
Flyte works by automating the hard infrastructure challenges.
“Stuff like parallelism, and GPU — You don’t have to write any functions specifically for Flyte,” Stein said. “This is really important, because Flyte does this automatically under the hood for you so you don’t have to put in Flyte ‘Please run in parallel, yada, yada, yada’ in your Python code, and the decorators really specify on a task level how much what machine you want to run, or how much compute you need.”
The cloud native orchestration platform is built on top of Kubernetes and does require a Kubernetes engineer to assist if running Flyte solo.
It’s Not an ML Ops Tool but…
Flyte is often mistaken for an ML Ops tool, Stein said. It is not.
“We run MLOps on top of Flyte so you can bring your weights and biases or your y logs or whatever you want, and we basically connect those things together and make them work flawlessly. That’s really what the power of an orchestrator is,” Stein said
It does, however, allow you to see the full machine learning workflow, he added, where one workflow hooks into another. This ensures that data scientists can see what’s happening across everything from the beginning to the end, Stein said.
“You might have a data team and a classification model team, a forecasting model team, and they can all use the same platform, which is Flyte, and work together in the same workspace but still not stepping on each other’s toes,” Bantilan said.
“We don’t have access to any of your data — that’s really the most important thing,” Stein said.