How LMQL, a Superset of Python, Helps Developers Use LLMs

Chatting with ChatGPT is easy enough, but natural language does have its limits when it comes to interfacing with these large language models.
“The thing about natural language is that, by definition, it’s not formal language — it’s an informal form of speech, which means it’s less precise,” said Luca Beurer-Kellner, a PhD student at the Department of Computer Science, ETH Zürich, and part of the Secure, Reliable, and Intelligent Systems Lab. “You can try to be very precise also with natural language, but of course, this has its limits.”
In an academic paper published in May, Beurer-Kellner, along with Marc Fischer and Martin Vechev, proposed another way to interact with generative AI models: Language Model Query Language (LMQL), a programming language designed to work with or interact with large language models, by allowing lightweight scripting and constraining of outputs in addition to natural language queries.
“The fundamental thing we observed was […] the way you work with them [LLMs], you prompt them to ask them about all sorts of things to complete all sorts of tasks for you,” Buerer-Kellner told The New Stack. “We found this to exhibit certain parallels to programming because, originally, our research group is also focused on the intersection of programming language research and machine learning research.”
LMQL Helps Squeeze More Value from LLMs
LMQL is a superset of Python and it allows developers to utilize the formal aspects of programming languages on top of natural language, Beurer-Kellner told The New Stack. This makes it more precise and more convenient to use, yet still easy to use and intuitive for people to understand, he said.

Screenshot from Prompting Is Programming: A Query Language for Large Language Models
That can also be used to unlock more potential from large language models. LMQL can establish an interface that can benefit from LLMs and machine learning outside the confines of a chatbot, he added.
“What’s also very interesting from a machine learning perspective is that these models can do all sorts of things,” Beurer-Kellner said. “It’s amazing that they can actually have a conversation with you, but there are really good classification models, for instance, or they can do entity tagging, or image captioning, all sorts of things — even though fundamentally they’re text input/text output.”
LLMs can be modeled into downstream applications as well, he said. This makes them ready-to-go machine learning models for all sorts of domains, without having to do any training, he added.
“By constraining and forcing the model into a certain structure and template, you can make sure the model always adheres to an interface that you define ahead of time,” he said. “It’s not just done by hoping for the best and prompting the model to really do this, but it’s actually forcing the model in a strict way, meaning 100% of the time you’re going to get a yes/no answer. There’s really no way for the model to produce any other token if you specified it to do so.”
Tokens are what LLMs are actually calculating when they make a prediction. Basically, each word is either a token or broken into several tokens.
API Cost Savings with LMQL
LMQL is also a declarative language, which means the programming language describes what to do rather than how to do it. SQL and HTML are declarative languages. However, it has aspects of an imperative language, such as C, C++, Java, and Python. These languages describe how to do something.
“[If] you want a certain output to always be an integer number, for instance, these things we represent declaratively, which also makes LMQL almost look like SQL. But then the prompting itself, when you construct your input and you want to pull in some data from your external sources or concatenate different things together — this can be done in a fully imperative style, just like in Python,” Beurer-Kellner explained. “We tried to implement different paradigms for these different aspects to make sure all of them are accommodated in [a] more or less convenient way.”
One useful side-effect of using LMQL is that it can actually reduce the cost of using LLMs by cutting or shortening API calls to the model, the creators of LMQL found.
That’s significant: Language models are typically very large neural networks and practical inference demands high computational costs and significant latency, the paper explained. That can translate into high usage costs per query answered in pay-to-use APIs.
For instance, if the model is generating beyond the needed response, LMQL can help intercept it early on to ensure it doesn’t wander, he said.
“We can actually restrict the space or the continuation of the models […] during text generation,” he said. “Also if it runs off in a direction we can intersect early on, meaning we can terminate early and make sure that it doesn’t generate lots of text that will not be needed anyway; and all this text you don’t end up generating you save compute or API costs on this.”