With Conductor, Orkes Tackles LLM Orchestration Workflows
Amid all the hullabaloo about generative AI and large language models (LLMs), developers still have practical questions about how, exactly, to add the benefits of LLMs to their applications.
- Do I need to maintain external stacks to manage the AI side of my application outside of where the rest of the components are?
- Do I need to write drivers to interact with vector databases for uses such as retrieval augmented generation (RAG)?
- How do I bring the human users working with AI into the picture?
- How do I ensure that enterprise applications are using LLMs in a safe, governed and observable manner while ensuring my organization’s proprietary data is not unnecessarily exposed?
Those are issues that Orkes, which offers a cloud-hosted version of the Conductor open source platform, has been tackling. Its recently released Orkes AI Orchestration provides a way for enterprises to weave model inferences and vector database tasks into their business logic. A companion feature called Human Tasks enables users to designate points in a process where humans need to direct input or provide oversight.
Workflows Across Microservices
Netflix built out Conductor as a way to manage workflows that span across microservices during its explosive growth. It was open sourced with the Apache 2.0 license in 2016. Its creators, including Jeu George, Viren Baraiya and Boney Sekh, along with Dilip Lukose, created Cupertino, California-based Orkes, which offers a managed version of Conductor.
Billed as an orchestration platform, Conductor sounds somewhat like Kubernetes. But there’s a difference. While Kubernetes automates the deployment, scaling and management of containerized applications, Conductor focuses on managing the execution, routing and coordination of tasks in a workflow.
ChatGPT explained it this way:
“While both manage different aspects of application orchestration, they can be complementary. Kubernetes can manage the deployment and scaling of applications, including microservices, while Conductor can handle the workflow and task coordination within those microservices.”
Explained Baraiya, CTO at Orkes: “Essentially, if you are building applications in the cloud, you are using some sort of eventing system or microservices-based architecture, and one problem that you have to solve is how do you coordinate your work across all different systems, maintain the state and build your application? So Conductor is designed to solve that problem.”
Workflows for Incorporating LLMs
Orkes AI Orchestration creates workflows for incorporating LLMs and vector embeddings into application development.
“When you are writing an application, and you have to wire up the calls to AI and your other services, this is where Conductor helps,” said Baraiya.
“Let’s say, before you make a call to an LLM, you [call] your database to get some data, and then augment the prompt with the data and then make a call to an LLM. So how do you sequence these things out together? Based on the output of the LLM, you want to make some determination whether it needs human intervention or not. If that’s the case, then [it’s] routed to humans, so the routing piece is another thing.
“So routing, wiring things up together. As a developer, you don’t have to worry about [writing these individual tasks],” he said. And you don’t have to build a custom UI to visualize these things, because the UI comes out of the box with Conductor.
“And let’s say for whatever reasons the LLM fails — you get a rate limit error or something like that — it does automatic retries for you, you don’t have to worry about it. …In reality, there are going to be errors, there are going to be issues. This is where the system takes care of all of this for you automatically.”
With the new Orkes AI Orchestration product, enterprises can:
- Easily add language models or ML inferences from model providers including Open AI, Hugging Face, Azure Open AI, Google Vertex AI, Amazon Bedrock.
- Integrate with vector database providers such as Pinecone and Weaviate to not only store embeddings but to provide vector search as well.
- Create granular role-based access to models and vector databases so that if sensitive information is stored as embeddings, you can designate only members of the finance team or certain members of that team, for instance, to have access to that information.
It also offers:
- Prompt templates to experiment and test prompts visually.
- The ability to weave in model interactions and vector database system tasks, including LLM Text Complete to send a prompt to a text completion model and LLM Search Index, which uses natural language to find similar embeddings.
- Comprehensive and auditable governance with data about every interaction between Orkes Conductor and a model published to a queue with change data capture.
With Human Tasks, the workflow pauses until the human interaction is completed. The developer can specify who is responsible for a certain task and the time period in which it is to be completed. If it is not, it reassigns the task to the next person in the chain and notifies them, and this keeps going until the task is either completed or times out.
Language Agnostic, across Clouds
“Especially with large language models, LLMs, is that when you want to incorporate elements in your application flow, you have to orchestrate across those elements. And it’s not just that you use one LLM from OpenAI or Google’s Bard or Lama 2. And when you build an application, it typically becomes part of your application stack, where you have microservices calls to your database, and then you have calls to your LLM. So they are just another API call in the end. But it requires some more effort and consideration,” Baraiya said.
“So it decouples you from having to work with a very specific language and meet you where your application stack is designed,” he said.
The company also is heavily invested in visibility.
“When you are incorporating LLMs, you have to understand what inputs are you giving, what outputs are producing, what’s your entire application flow. So one advantage of using Conductor is that it is a fully full visual system. It can take your code and visualize the entire code graph into a diagram. So exactly what path was taken, why an LLM took a certain action, and so forth,” he said.
It’s also agnostic to your cloud deployment. And using techniques like caching can help keep costs down.
“Let’s say if you want to leverage Google’s Bard or Vertex AI, for example. It only runs in GCP (Google Cloud Platform). But the rest of the application runs in AWS. How do you connect them together? [Conductor] is a completely cross-cloud system. So part of your system could be running in AWS, when it comes to making calls to Vertex AI in GCP. That part alone runs in GCP. And then everything gets connected very seamlessly,” he said.
Orkes also has built a security and governance model to allay enterprise concerns.
“Who in the company will have access to and who can see that, who can delete that, who can create that, and in both your development test and production environments, right?” said George, the Orkes CEO. “That creates a comfort for enterprises saying that, ‘OK, this now restricts usage, and you can get the best out of it there.’”
“[And] you have maybe three or five or 10 different models that you expose. It gives engineers the ability to see which model is actually enough for their usage. So you might be a very cost-efficient model that might work for use case one, but may not work for use case two. And there you can use an expensive model. So you can actually manage costs that way as well. And with the visibility piece, on how many executions are run and stuff like that can help you manage costs also.”