Where are you using WebAssembly?
Wasm promises to let developers build once and run anywhere. Are you using it yet?
At work, for production apps
At work, but not for production apps
I don’t use WebAssembly but expect to when the technology matures
I have no plans to use WebAssembly
No plans and I get mad whenever I see the buzzword
AI / Large Language Models / Software Development

4 Key Tips for Building Better LLM-Powered Apps

Here’s some advice and techniques for improving the accuracy of LLM apps, along with considerations for choosing the right LLM.
Dec 5th, 2023 10:44am by
Featued image for: 4 Key Tips for Building Better LLM-Powered Apps
Image from Rusha on Shutterstock.

In the year since OpenAI released its first ChatGPT model, there’s been an explosion of interest in generative AI. Large language model (LLM)-powered apps are now at the forefront of how businesses are thinking about productivity and efficiency, and the tools and frameworks for building generative AI apps have expanded greatly. But there are still concerns about the accuracy of generative AI outputs, and developers need to quickly learn to deal with this and other issues to build powerful, trustworthy apps.

Here’s some advice and techniques for improving the accuracy of LLM apps, along with considerations for choosing the right LLM. We can’t deal with these issues exhaustively as each is complex in its own right, but we can offer some advice to get started which you can explore further.

Streamlit, a free and open source framework to rapidly build and share machine learning and data science web apps, recently published a report analyzing more than 21,000 LLM apps built by over 13,000 different developers on the Streamlit Community Cloud. It provides insights into some of the tools and techniques developers have been using to date to build their apps, which informs some of the advice below.

For example, vector retrieval tools can be effective for improving contextual recommendations for LLM-powered apps, but our survey found that a minority of developers are using vector capabilities today, representing a large opportunity for the future.

As more developers harness the power of generative AI for app development, we’ll start to see apps across categories and industry verticals start to have AI-based search built in, alongside conversational and assistive experiences. Below are my four tips for developers to help them build better LLM-powered apps, so they can bring true disruption to their organizations.

Employ Agents and Orchestration for Smarter Apps

Orchestration frameworks like LangChain and LlamaIndex can help supplement your model using additional tools, or agents, that augment the capabilities of your LLM-based app. In this context, think of the agent as a plug-in system that allows you to build additional functionality into the app, expressed in natural language.

These agents can be combined to manage and optimize LLM functions, such as refining AI reasoning, addressing biases and integrating external data sources. The agents can also provide a way for the LLM to reflect on whether it’s making an error and the steps it must take to successfully complete a task.

For an analogy, think about how a developer writes an API that delivers a certain function and the documentation that describes it: The API is expressed as code and the documentation is in natural language. Agents work in a similar way, except the documentation is provided for the benefit of the LLM, not other developers. So the LLM looks at the task at hand, looks at the agent’s documentation and determines whether the agent can help it complete its task.

These agents also add robustness to LLM apps by providing a way for the app to reflect on its own mistakes and correct them. For example, suppose an LLM app writes some SQL code to perform a task, like check inventory levels in a database, but it makes an error in the code. With a standard, “naive” LLM app, the error is the end of the road.

However, if the app has an agent that executes SQL, it can look at the error and use the agent to determine what it should have done differently and then correct the error. This might be something as simple as a small change in syntax, but without the agent, the LLM has no way to reason through its mistake.

Use Vector Magic and RAG to Fight Hallucinations

Sometimes the LLM you’re using won’t have access to all the information it needs to complete the intended tasks. Additional information can be injected at prompt time, but most LLMs place limits on the size of these prompts. To get around these limits, the LLM may need to query an external database using vectors, a technique called retrieval augmented generation (RAG).

To understand what RAG can do for an LLM app, it’s helpful to think about three different levels of LLM apps.

  • Level 1: The app can generate results using the knowledge already within the LLM.
  • Level 2: The app requires additional information that can be injected at prompt time. This is fairly straightforward as long as you can stay within the prompt limits.
  • Level 3: The LLM needs to reach out to an external source of information, such as a database, to complete the task.

Level 3 is where RAG comes in, and the external database is usually semantically indexed with vectors, which is why you may have heard a lot lately about vector databases and vector search tools.

Apps with vector databases and vector search enable fast, contextual search by categorizing large, unstructured datasets (including text, images, video or audio). This can be incredibly effective for making faster, stronger contextual recommendations. But vector tools are still not widely used. The Streamlit survey found that only 20% of gen AI-powered apps used some form of vector technology.

Chatbots Give Users a Powerful Way to Refine Queries

Chatbots brought generative AI into the mainstream, but there’s been some skepticism about whether they will be an effective interface moving forward. Some have argued that chatbots give the user too much freedom and not enough context about how an LLM app can be used. Others are put off by failures in the past: Clippy was a disaster, so why should chatbots succeed today?

Obviously, whether a chatbot is appropriate depends partly on the intended use of the app. But chatbots have at least one very useful quality that should not be overlooked: They provide a simple, intuitive way for users to add context and refine answers through a fluid, human-like interface.

To understand why this is powerful, think about search engines. There’s typically no way for a user to refine a search engine query; if the results are slightly off, there’s no way to tell the search engine to “try again but exclude answers about X,” for example, or “give more weight to Y.” That would be a convenient and powerful capability, and it’s one that chatbots provide for LLM apps.

The survey found that 28% of generative AI apps built in Streamlit were chatbots, versus 72% that generally did not allow for conversational refinement. On the flip side, the survey shows that weekly usage of those chatbots rose to almost 40%, while the usage of non-chatbot apps declined. So it may be that chatbots are a preferred interface for end users. The report includes examples of apps with different modes of accepting text input, so you can take a look and see what’s possible.

Consider Alternatives to GPT, Including Open Source LLMs

The foundational GPT models are still the best-known LLMs and they are very capable, but more options have emerged in the past year and some may be more suitable for your app. Factors to consider include the breadth of knowledge required from the LLM, the size of the LLM, your training needs and budget, and whether it’s important to you if the LLM is open source or proprietary. As with many things in tech, there are trade-offs.

If you’re building a generative AI app for internal use, you may need to train that LLM on internal corporate data. For most enterprises, sharing sensitive data with a public LLM is a non-starter for security reasons, so many companies run LLMs within their existing cloud security perimeter. This often leads them toward smaller LLMs, such as AI21 and Reka.

Very large LLMs also tend to have higher latencies and are typically more expensive to run because of the computing resources required. If the app is performing a relatively simple task, such as translating text or summarizing documents, a smaller LLM may work well and cost significantly less to use and operate.

You may also have reasons for preferring an open source LLM, such as Meta’s LLaMA, over a proprietary LLM like those from OpenAI, Anthropic or Cohere, where the source code, training data, weights or other model details are typically not publicly disclosed. Open source LLMs require self-hosting or inferencing through a hosting provider, but the source code and other model details are more readily available.

Get Started with Generative AI Today

Generative AI is still a rapidly emerging field, but the tools and technologies required are advancing quickly and there are many options to get started today. Developers who seize this opportunity can provide great value for their organization, incorporating AI apps as a regular feature for daily business operations and tasks. As generative AI continues to reshape roles and responsibilities across organizations, developers that lean in and become experts in LLM-powered apps will come out on top, and the above advice should help set you on the right track to get started.

Group Created with Sketch.
THE NEW STACK UPDATE A newsletter digest of the week’s most important stories & analyses.