LangStream: an Event-Driven Developer Platform for LLM Apps
DataStax has just released a new open source project called LangStream, bringing together “data streaming and generative AI.” I spoke with streaming engineer Chris Bartholomew, the lead on the project, about what LangStream contributes to the emerging AI app ecosystem, and what (if any) similarities it has to the popular LangChain project.
DataStax is over a decade old now, having first risen to prominence in the cloud native community with a data management product built on the open source NoSQL database Apache Cassandra. Nowadays, DataStax styles itself as “the real-time AI company” and so its latest products have pivoted strongly towards generative AI.
On its homepage, LangStream is described as a platform to “build and run event-driven Gen AI apps.” Bartholomew further explained that LangStream is for event-driven and streaming architectures, which makes it different from existing AI app development systems. He says these architectures are especially beneficial for generative AI applications, due to their ability to handle high data volumes and prioritize the most recent and relevant data.
“The newer, the more relevant your data, the better when you’re building your prompts and you’re sending those to the LLM,” he said.
LangStream and Vector Databases
Bartholomew says that LangStream is an “agnostic, vendor-neutral open source project,” although out of the box it supports DataStax’s vector database, Astra DB. It also supports Milvus (an open source vector database) and Pinecone.
I asked how a developer might use LangStream alongside a vector database?
He replied that there are two primary components of the workflow. Initially, data (often unstructured) is sent through a pipeline for vectorization. This involves the deployment of specialized agents which crawl websites or access documents from storage sources such as an S3 bucket, then segment this data and employ an embedding model from platforms like OpenAI or Hugging Face. The resulting data is then synced with a vector database.
The next part involves using this data in an application, such as a generative AI chatbot. Upon receiving a user query, Bartholomew explained, LangStream probes the database for relevant data (using the RAG pattern — Retrieval Augmented Generation), turns that data into a prompt for an LLM, and then invokes the language model.
This indeed sounds like a useful approach to using vector databases in an application, but where does the real-time data aspect come in?
Bartholomew noted the dynamic nature of data, particularly in vector format, which continually evolve and are not static. He said it’s critical to regularly re-evaluate data that’s being used in LLM apps.
“For example, […] if you’re pulling data off a website, an internal website for a private chatbot, you’re going to want to re-evaluate that for new data, as you have the data coming in.”
He added that LangStream has “an automatic pipeline that keeps evaluating for new data.”
How to Build Apps in LangStream
In terms of how developers can use LangStream as a platform to create LLM applications, I asked Bartholomew to explain how this works in practice.
He replied that LangStream operates as a developmental framework, offering a “no-code” approach where users can compose pipelines by configuring and combining various “agents.” But for more advanced use cases, developers can write custom agents in Python.
“So you can write any kind of bespoke code you want. We also pre-install popular Python libraries, like LangChain and LlamaIndex — that kind of stuff — into the runtime environment.”
He added that the runtime environment is based on Kubernetes and Apache Kafka. “We could have just written a library to pull these things together,” he said, “but we really wanted to have a reliable runtime for LangStream applications.”
The mention of LangChain made me ask whether LangStream has any similarity to the better-known “Lang” product?
He replied that LangStream is complementary to LangChain. He used the example of a prototype app created using LangChain.
“So you can take that and you can convert it and run it in LangStream. Because, like I said, LangStream is a runtime environment, not just a development environment.”
He added that you might also want to “decompose” or “recompose” a LangChain app into an event-based architecture. In other words, make it into a distributed microservices-based application.
“And then you get advantages of scalability,” he said. “It’s easy to understand how to scale that because that’s a well-understood pattern. You get robustness.”
He replied that you have to be careful interfacing with LLM systems like OpenAI “through the frontend in the browser,” because you may be exposing your private keys. He says a more secure architecture is to have a frontend that talks to a backend.
“You’ll have some authentication, that’s the method there, but you’re not exposing your keys to expensive LLM calls.”
According to Bartholomew, best practice is to “write a frontend application that talks to a backend application,” which is how DataStax has set up LangStream. He noted that it uses WebSocket gateways to communicate between the two.
One of the use cases for this approach (event-based and having a frontend talk to a backend) is what Bartholomew called “a chatty chatbot.” This is a chatbot that doesn’t just reply to your questions but can initiate the conversation and prompt you if need be.
“Today, chatbots are request-reply,” he explained. “I ask it a question and then it replies to me. It waits for me to ask a question. [But] because we’re event-driven, and we can asynchronously send messages back and forth, we can actually have the chatbot initiate a conversation. It can send you a message and say, ‘Welcome, I’m the chatbot, I do this.’ If you haven’t asked a question for a while, it can kind of try to keep the conversation going.”