Favorite Social Media Timesink
When you take a break from work, where are you going?
Video clips on TikTok/YouTube
X, Bluesky, Mastodon et al...
Web surfing
I do not get distracted by petty amusements
AI / Large Language Models

Improving ChatGPT’s Ability to Understand Ambiguous Prompts

Prompt engineering technique helps large language models (LLMs) handle pronouns and other complex coreferences in retrieval augmented generation (RAG) systems.
Nov 27th, 2023 9:00am by
Featued image for: Improving ChatGPT’s Ability to Understand Ambiguous Prompts
Featured image by Michael Dziedzic on Unsplash.

In the ever-expanding realm of AI, large language models (LLMs) like ChatGPT are driving innovative research and applications at an unprecedented speed. One significant development is the emergence of retrieval augmented generation (RAG). This technique combines the power of LLMs with a vector database acting as long-term memory to enhance the accuracy of generated responses. An exemplary manifestation of the RAG approach is the open source project Akcio, which offers a robust question-answer system.

Akcio's architecture diagram

Akcio’s architecture

In Akcio’s architecture, domain-specific knowledge is seamlessly integrated into a vector store, such as Milvus or Zilliz (fully managed Milvus), using a data loader. The vector store retrieves the Top-K most relevant results for the user’s query and conveys them to the LLM, providing the LLM with context about the user’s question. Subsequently, the LLM refines its responses based on the external knowledge.

For instance, if a user queries, “What are the use cases of large language models in 2023?” about an article titled “Insights Report on the Progress of Large Language Models in 2023” that was imported into Akcio, the system adeptly retrieves the three most relevant passages from the report:

Akcio combines these passages with the original query and forwards them to the LLM, generating a nuanced and precise response:

The Challenge of Coreference Resolution in RAG

However, despite the strides made, implementing RAG systems introduces challenges, particularly in multi-turn conversations involving coreference resolution. Consider this sequence of questions:

The pronoun “their” in Q2 refers to “generation AI and decision-making.” Yet, the LLM might generate irrelevant results to this question that undermine the conversation’s coherence:

Using ChatGPT for Coreference Resolution

Traditional methods, such as tokenization, lemmatization and keyword replacement using recurrent neural networks, are often inadequate for resolving complex references. Consequently, researchers have turned to LLMs like ChatGPT for coreference resolution tasks. This approach involves instructing ChatGPT to substitute pronouns or retain the original question based on the context provided. While this method is promising, it occasionally produces direct answers instead of following the prompt instructions, which indicates the need for a more-refined strategy.

Examples Tested

We experimented with straightforward commands urging ChatGPT to replace pronouns using the following prompt format:

Example 1


ChatGPT’s response:

In this case, ChatGPT did a great job, replacing “it” with “Natural Language Processing (NLP).”

Example 2


ChatGPT’s response:

In this case, ChatGPT struggled with substituting “this year,” leading to an incomplete resolution.

Example 3


ChatGPT’s response:

Unfortunately, ChatGPT diverges from the instructions, possibly due to its intricate decision-making process. Despite our efforts to reinforce the prompt, ChatGPT occasionally veers toward direct answers, complicating the coreference resolution task.

Few-Shot Prompt with Chain of Thought: A Refined Approach

Prompt engineering plays a pivotal role in harnessing LLMs effectively. We decided to test combining few-shot prompts with the Chain of Thought (CoT) method as a promising strategy. Few-shot prompts present LLMs with multiple reference examples, guiding them to emulate those examples in their responses. CoT enhances LLMs’ performance in complex reasoning tasks by encouraging step-by-step reasoning in their answers.

By integrating these techniques, we developed a prompt format to guide ChatGPT through coreference resolution. The revised prompt format includes an empty conversation history, basic examples, failed pronoun replacements and cases involving multiple pronouns, to offer ChatGPT more explicit instructions and reference examples. Instances where ChatGPT returns NEED COREFERENCE RESOLUTION: Yes are crucial, as they indicate that ChatGPT needs to replace pronouns or ambiguous references for a coherent response.

Here is a refined prompt format:

Examples Tested and Refined Responses

Here are some results from our experiments with refined prompts:

Example 1


ChatGPT’s refined response:

Example 2:


ChatGPT’s refined response:

The refined prompt format significantly enhances ChatGPT’s ability to handle intricate coreference resolution tasks. Questions involving multiple entities, which previously posed challenges, are now addressed effectively. ChatGPT adeptly substitutes pronouns and ambiguous references, delivering accurate and contextually relevant responses.


Prompt engineering plays a pivotal role in resolving coreference problems in RAG systems using LLMs. By integrating innovative techniques such as few-shot prompts and CoT methods, we’ve significantly improved handling complex references in RAG systems, enabling LLMs like ChatGPT to substitute pronouns and ambiguous references accurately and resulting in coherent responses.

Group Created with Sketch.
THE NEW STACK UPDATE A newsletter digest of the week’s most important stories & analyses.