LLM App Ecosystem: What’s New and How Cloud Native Is Adapting

The developer ecosystem for AI-enabled applications is beginning to mature, after the emergence over the past year of tools like LangChain and LlamaIndex. There’s even now a term for AI-focused developers: AI engineer, which is the next step up from “prompt engineer,” according to its proselytizer Shawn @swyx Wang. He’s created a nifty diagram showing where AI engineers fit into the wider AI and development ecosystems:

Via swyx.
A large language model (LLM) is the core technology for an AI engineer. It’s no coincidence that both LangChain and LlamaIndex are tools that extend and complement LLMs. But what other tools are available to this new class of developer?
The best diagram for an LLM stack I’ve seen so far is from the VC firm, Andreessen-Horowitz (a16z). Here’s its view of an “LLM app stack”:
The All-Important Data Layer
Needless to say, the most important thing in an LLM stack is the data. In a16z’s diagram, that’s the top layer. The “embedding model” is where the LLM comes in — you can choose from OpenAI, Cohere, Hugging Face, or one of a few dozen other LLM options, including the increasingly popular open source LLMs.
But even before you get to LLMs, a16z implies that you need to set up a “data pipeline” — it lists Databricks and Airflow as two examples, or you could just go “unstructured” with your data. Not mentioned by a16z, but I think it fits into this part of the data cycle, are tools that help enterprises “clean” or simply curate data before it is fed into a custom LLM. So-called “data intelligence” companies like Alation offer this type of service — it’s a cousin of the better-known “business intelligence” category of tools in the enterprise IT stack.
The final part of the data layer is a class of tools allowing you to store and process your LLM data — the vector database. According to Microsoft’s definition, this is “a type of database that stores data as high-dimensional vectors, which are mathematical representations of features or attributes.” The data is stored as a vector via a technique called “embedding.”
When I spoke to leading vector database vendor Pinecone back in May, they pointed out that its tool is often used alongside data pipeline tools, like Databricks. In such cases, the data usually resides elsewhere (a data lake, for instance) and is then transformed into embeddings by running it through a machine-learning model. After processing and chunking the data, the resulting vectors are sent to Pinecone.
Prompts and Queries
The next two layers can be summarized as prompts and queries — it’s where an AI application interfaces with an LLM and (optionally) other data tools.
A16z positions both LangChain and LlamaIndex as “orchestration frameworks,” meaning tools that developers can use once they know which LLM they are using.
According to a16z, orchestration frameworks like LangChain and LlamaIndex “abstract away many of the details of prompt chaining,” which means querying and managing data between an application and the LLM(s). Included in this orchestration process is interfacing with external APIs, retrieving contextual data from vector databases, and maintaining memory across multiple LLM calls.
The most intriguing box in a16z’s diagram is “playground”, which includes OpenAI, nat.dev and Humanloop. A16z doesn’t define what this is in the blog post, but we can deduce that “playground” tools help the developer do what a16z calls “prompting jiu-jitsu.” These are places where developers can try various prompting techniques.
Humanloop is a British company and one of the features of its platform is a “collaborative prompt workspace.” It further describes itself as a “complete developer toolkit for productionizing your LLM features.” So basically it allows you to try LLM stuff out and then deploy it to an application if it works. (I’ve reached out to the company to set up an interview, so I will be writing more about this separately.)
LLM Ops
To the right of the orchestration box are a host of operational boxes, including LLM cache and Validation. There are also a bunch of cloud and API services related to LLMs, including open API repositories like Hugging Face, and proprietary API providers like OpenAI.
This is perhaps where the stack most resembles the developer stack we’ve become accustomed to in the “cloud native” era, and it’s no coincidence that a number of DevOps companies have added AI to their list of offerings. In May I spoke to Harness CEO Jyoti Bansal. Harness runs a “software delivery platform” that focuses on the “CD” part of the CI/CD process [continuous integration and continuous delivery/continuous deployment].
Bansai told me that AI can alleviate the tedious and repetitive tasks involved in the software delivery lifecycle, starting from generating specifications based on existing features, to writing code. Also, he said that AI can automate code reviews, vulnerability testing, bug fixing, and even the creation of CI/CD pipelines for builds and deployments.
AI is also changing developer productivity, according to another conversation I had in May. Trisha Gee from Gradle, the build automation tool, told me that AI can accelerate development by reducing the time spent on repetitive tasks — like writing boilerplate code — and enabling developers to focus on the bigger picture, such as ensuring the code meets business requirements.
Web3 Is Dead, Long Live the AI Stack
What we’ve seen so far in the emerging LLM developer stack is a bunch of new product types — such as the orchestration frameworks (LangChain and LlamaIndex), vector databases, and “playground” platforms like Humanloop. All of them extend and/or complement the underlying core technology of this era: large language models.
But we’ve also witnessed nearly all companies from the cloud native era adapting their tools to the AI Engineer era. That augers well for the future evolution of the LLM stack. The phrase “standing on the shoulders of giants” springs to mind: the best innovation in computer technology invariably builds on what came before. Perhaps that’s what undid the failed “Web3” revolution — which wasn’t so much building atop the previous generation, as trying to usurp it.
This new LLM app stack is different; it’s a bridge from the cloud development era to a newer, AI-based developer ecosystem.