TNS
VOXPOP
Where are you using WebAssembly?
Wasm promises to let developers build once and run anywhere. Are you using it yet?
At work, for production apps
0%
At work, but not for production apps
0%
I don’t use WebAssembly but expect to when the technology matures
0%
I have no plans to use WebAssembly
0%
No plans and I get mad whenever I see the buzzword
0%
Data / Large Language Models / Open Source / Software Development

The Role of SQL in LLM Apps and How Snowflake Uses LangChain

In an interview with Adrien Treuille, we discuss building data apps with LLMs and SQL, using LangChain, and why Snowflake loves Code Llama.
Sep 12th, 2023 8:51am by
Featued image for: The Role of SQL in LLM Apps and How Snowflake Uses LangChain
Image via Unsplash

Meta’s recent release of Code Llama, a large language model (LLM) for code generation, prompted the data cloud company Snowflake to evaluate Code Llama’s performance on SQL code generation. It found that “Code Llama models outperform Llama2 models by 11-30 percent accuracy points on text-to-SQL tasks and come very close to GPT4 performance.” Snowflake also discovered that by fine-tuning Code Llama, it could make it up to 50 percent accuracy points better.

To find out more about Snowflake’s plans for SQL in the generative AI era, and why it’s suddenly all-in on Code Llama, I spoke to Adrien Treuille, director of product management and head of Streamlit (a Python app builder that was acquired by Snowflake in March 2022).

Riding First Class with SQL and Python

Treuille began by noting that Streamlit’s Community Cloud is currently host to over 10,000 LLM-powered apps, so it’s already become a leading platform for LLM app developers. “It’s a linchpin of Snowflake’s app strategy as well,” he added.

When it comes to connecting LLMs with Snowflake’s extensive data platform, SQL is the glue. “Snowflake was built on SQL,” said Treuille, “and so all functionality is available in SQL as a first-class citizen.” SQL, of course, enables you to add structure to massive swathes of data. Also, as Treuille put it, Snowflake’s “original market was database admins, people who basically speak SQL for a living.”

As for Streamlit, it was built on the back of Python. Now that Snowflake owns Streamlit, Python has also become a first-class language in the company.

“It means that, basically, all functionality [in Snowflake] has first-class Python bindings,” Treuille explained. “And of course, in Python, you can call SQL if you need an escape hatch down into the bowels of Snowflake. So yes, we are committed to both Python and SQL as being the languages of Snowflake.”

Building a Structured Data App with LLMs and SQL

Where a developer might decide to use Snowflake to build an LLM app when the data they’re accessing and querying is so complex that it needs further structure before it can be used in an application. Usually, this means both an LLM and at least one external data source are involved — that external data could be stored in Snowflake and/or elsewhere, such as in a vector database.

Treuille said that apps like a customer support chatbot or a “product suggestion bot” are good examples of the type of apps typically built on Snowflake using this “combination of LLMs and structured search.”

In a demo entitled “Building an LLM-Powered Chatbot,” at the Snowflake Summit 2023 in late June, Treuille showed how interacting with a Streamlit chatbot app in natural language can generate and run SQL queries on a data store in Snowflake.

“We now have a chatbot that is actually creating SQL on the fly based on our natural language input, running the SQL query and generating the response inline in our custom chatbot,” he said in the demo (see screenshot below).

Snowflake LLM app

SQL is generated and run by the LLM chatbot. Click for full image.

Why Code Llama Is So Important

It makes perfect sense that Snowflake would want to promote SQL code generation in LLMs, but why is it so excited about Meta’s new Code Llama LLM in particular?

“Six months ago, there was a fear that you were either one of the two or three superpowers who could build hyper-intelligent LLMs — like OpenAI — and there was everyone else,” Treuille replied. “And you either went to VCs and raised billions of dollars — like, you know, Anthropic — or you would inevitably be a customer and ultimately disintermediated by these massive super-intelligent LLMs from others.”

But now, he continued, “Facebook has completely abandoned that paradigm, by open sourcing some of the most powerful LLMs in the world.”

So Snowflake is, essentially, hitching its wagon to the open source LLMs being released by Meta (and perhaps others later). Snowflake can fine-tune an LLM like Code Llama to suit its own purposes — in this case, so that it does text-to-SQL better. It means the company doesn’t have to rely on a proprietary LLM provider, like OpenAI, because it can build its own LLMs from Meta’s open sourced models.

“Snowflake’s LLMs are near GPT level on standard tasks,” said Treuille, adding that “anyone can benchmark this.” In other words, he’s saying that its fine-tuned Open Llama LLM is “near” the quality of OpenAI’s GPT on tasks like text-to-SQL. “And that is totally game-changing,” insists Treuille.

Other Parts of the LLM App Ecosystem

In addition to creating its own fine-tuned LLMs, Snowflake plays nicely with other parts of the LLM app ecosystem, said Treuille. He added that not only is Snowflake “compatible with vector databases,” but it is “in private preview for our own vector database product.” This isn’t surprising, given how many different product types are already represented in Snowflake’s platform.

Perhaps more interesting is how Snowflake works alongside LangChain, the LLM orchestrator that has been a core part of many early LLM applications. During the presentation that Treuille and a couple of colleagues did at Snowflake Summit 2023, the group demonstrated how LangChain can be used to “help us organize the LLM’s thoughts so that it actually can decide the strategy it wants to take to solve a problem.”

In the example that was demoed, LangChain (which we were told was using GPT-4) acted as a kind of facilitator between the user and the SQL queries that the main LLM was generating.

Snowflake and LangChain

Snowflake and LangChain co-ordination. Click for full image.

Everyone Will Have Their Own LLM

I asked Treuille how he thinks the LLM app ecosystem will evolve over the next few years, and what Snowflake’s role will be in this.

“If I could describe a North Star,” he replied, “it would be: talk to your data.”

Eventually, he thinks the industry will get to a place where every enterprise company essentially has its own LLM that “embodies all their knowledge.” He acknowledged that “it’ll be a little bit more structured than that — you may have an LLM that embodies all the knowledge, [but] you will still have structured databases against which you can run queries, and there’s going to be some non-trivial logic in-between.”

But from a product point of view, enterprise customers will end up with what they view as their own custom LLM solution. Which, of course, Snowflake hopes to provide.

Group Created with Sketch.
THE NEW STACK UPDATE A newsletter digest of the week’s most important stories & analyses.