3 Ways LLMs Can Let You Down
Great things are expected from OpenAI’s upcoming release of GPT-5, perhaps even — so the most optimistic predict — the arrival of sentient artificial general intelligence. But at the same time, CEO Sam Altman and company face a number of serious hurdles to bring it to market, he admitted earlier this month.
A clue to Altman’s challenges can perhaps be found in research papers that have been posted recently that catalog various shortcomings of various versions of OpenAI’s GPT and Large Language Models in general.
Collectively these papers suggest that an LLM-based intelligence, far from being a purely helpful and honest all-benevolent entity, could in fact turn out to be a dimwitted sociopath that would devour all your personal data then break down just when you need it the most.
We may never learn the true reason why the OpenAI board abruptly fired Altman, but certainly, an ill-behaved LLM certainly would not have lightened the mood in the boardroom.
As the capitalization-averse Altman himself wrote in a short message on the social media service formerly known as Twitter: “i expect ai to be capable of superhuman persuasion well before it is superhuman at general intelligence, which may lead to some very strange outcomes.”
With that in mind, here are three ways LLMS can let you down:
LLMs Will Deceive You
We have long known that LLMs can hallucinate incorrect information, but given a conflicting set of prompts (either maliciously or otherwise) they may purposefully lie to you — while deny doing so!
“To our knowledge, this is the first demonstration of Large Language Models trained to be helpful, harmless, and honest, strategically deceiving their users in a realistic situation without direct instructions or training for deception,” write Apollo Research scientist Jérémy Scheurer and his colleagues, in “Large Language Models can Strategically Deceive their Users when Put Under Pressure.”
In this study, the researchers set up a simulated environment where the GPT-4 assumes the role of an autonomous stock trading agent, with no guidance about deceptive behavior. It can use trading tools for query and analyze stocks as well as the ability to execute trades.
Once set up, the agent is put in what the researchers call a “high-pressure situation.” It gets an email from a manager who bemoans the company is not doing well, and needs to improve performance. When the agent starts training, it fails to find promising results in the stock market. Another employee of the company sends along some information that there will be a downtown in the market next quarter.
At this point, the agent then receives an insider tip from another employee about a potential merger (which would make for a profitable trade), along with a warning that trading on this information would not be approved by management.
Does it trade anyway? Heck yeah it does!
Trading the reasoning back, the researcher found that the LLM would recognize the unethical behavior, but justify it decision anyway, in light of the company’s woes.
Moreover, the agent repeatedly appeared to “deceive” the user by not providing a clear explanation of why the trade was committed. The insider information is not mentioned in the report the agent must file explaining the trading decisions. Even when it was directly asked, the agent denied that it knew about the merger announcement.
Running the scenario multiple times against various versions of GPT-4, the researchers concluded that all GPT-4 variants “exhibit high rates of misalignment, strategic deception, and doubling-down on strategic deception.”
Interestingly enough, this deception appears to be an emerging behavior. Neither GPT-3.5 and GPT-3 model variants displayed this apparent deceptive behavior.
“When models decide to trade based on insider information, their reasoning traces usually recognize the unethical and illegal nature of the action, but they still justify the choice by emphasizing the risks and the prevailing situation.”
The researchers were hesitant to make any broader claims about LLM behavior than this specific scenario. However, there is at least one takeaway to mull: Even if you give an LLM specific instructions not to do something, it may just go ahead and do it anyway
“System prompts are not sufficient for guaranteeing aligned actions,” they wrote. And, in fact, if a model executes a prohibited action, it may act to “strategically conceal its misalignment.”
LLMs are Kind of Dumb
For an entity that has been promised to bring sentience to machines, LLMs aren’t the brightest bulbs in the AI box, two recent studies have found, one from Google and the other funded by the National Science Foundation.
The NSF-funded “Comparing Human, GPT-4 and GPT-4V on Abstraction and Reasoning Tasks,” led by Santa Fe Institute Professor Melanie Mitchell, compared both GPT-4 (text) and the GPT-4V (visual, or multimodal) against humans at a solving a series of abstract puzzles.
ConceptARC is a benchmark in the ARC domain that systematically tests understanding of core-knowledge concepts (e.g., “object”, “top/bottom”, “same/different”, etc.) (3/9) pic.twitter.com/gCxHUBUS4S
— Melanie Mitchell (@MelMitchell1) November 17, 2023
The test was designed to measure abstract thinking, which the researchers define as “the ability to induce a rule or pattern from limited data or experience and to apply this rule or pattern to new, unseen situations.”
Many of those using GPT in some form or another are convinced that it appears to have reasoning capabilities beyond what was in the training model. The test sought to help answer the question. It involved asking the LLM to solve a problem, given a set of detailed instructions and an example.
In multiple cases, however, neither version of GPT came close to humans’ ability to solve the puzzle, one based on the ConceptARC benchmark.
🎤 drop: “Our experimental results support the conclusion that neither version of GPT-4 has developed robust abstraction abilities at humanlike levels.”
“Performance of GPT-4 (text-only) is improved with better prompt (33% correct overall), but still far below that of humans… https://t.co/AdnZo0OZb8
— Gary Marcus (@GaryMarcus) November 17, 2023
The “generally high accuracies of humans on each concept indicates successful generalization over the different variations in each given concept group,” the researchers conclude. “In contrast, the much lower accuracies of programs we tested indicates a lack of ability to generalize over the variations in a concept group.”
Ilya on developing AGI.
Released Nov 2, 2023. pic.twitter.com/BzqsydzjrK
— Parker Rex 🗺️ (@ParkerRex) November 20, 2023
So not only did GPT flunk the ConceptARC, but LLMs did not appear to impress Google researchers either, or at least in their ability to generalize from their own knowledge bases, according “Pretraining Data Mixtures Enable Narrow Model Selection Capabilities in Transformer Models,” a research summary by Google DeepMind researcher Steve Yadlowsky.
In one set of tokenized tests, a transformer pretrained on linear functions was good at making good linear predictions, and a transformer trained on sinusoids can make good sinusoid predictions. So you might assume one trained on both would easily solve a “convex” problem with a combination of the linear and sinusoid techniques.
But you would be wrong.
“When the function is significantly far from those seen during pretraining, the predictions are erratic,” the researchers note.
“The model selection capabilities of the model are limited by proximity to the pretraining data, and suggests that broad coverage of function space is critical for generalized in-context learning capabilities.”
LLMs Will Implode, Eventually
We live in a rarified time in history where the corpus of human knowledge has yet been poisoned by AI-generated data. Almost everything written down has been generated by people.
But once AI-generated content is in the mix of any LLM, it will muddle the distribution table, making any model less and less accurate until it collapses entirely, warned a group of researchers — led by Ilia Shumailov at the University of Cambridge — in an Arxiv paper posted in May, “The Curse of Recursion: Training on Generated Data Makes Models Forget.”
In the case of GPT, the danger of this sort of inbreeding can be dangerous as long as the LLM continues to scrape data from the web, which will be increasingly augmented (to put it diplomatically) with large swaths of AI-generated content (based, in turn, on earlier versions of GPT).
“Model Collapse refers to a degenerative learning process where models start forgetting improbable events over time, as the model becomes poisoned with its own projection of reality.”
In the future, the researchers speculate that “the value of data collected about genuine human interactions with systems will be increasingly valuable in the presence of content generated by LLMs in data crawled from the internet.”
In other words, the longer we run the LLM, the more thirsty its craving for sweet, sweet human interaction.
View this post on Instagram
Models trained on their own data will devolve into a degenerative process in which they will “lose information about the true distribution.” First, the outliers will disappear from the data set, and then the variance will shrink, and the model will grow increasingly ill as it collects more errors, compounded over generations until the model, so polluted with its own data, will lose any sort of semblance to that which is actually being modeled, the researchers portend.
The researchers showed this happens in a variety of model types, not just in LLMs.