New Method for Detecting AI Hallucinations Introduced

Today's generative AI tools like ChatGPT have a problem: They often confidently provide incorrect information.

6/25/2024, 3:15 PM
Eulerpool News Jun 25, 2024, 3:15 PM

A persistent challenge with today's generative artificial intelligences (AI) like ChatGPT is that they often confidently assert false information. This behavior, referred to as "hallucination" by computer scientists, represents a significant obstacle to the usefulness of AI.

Hallucinations have already led to some embarrassing public incidents. In February, Air Canada was ordered by a tribunal to honor a discount mistakenly offered to a passenger by its customer service chatbot.

In May, Google had to make changes to its new search feature "AI Overviews" after the bot informed some users that it was safe to eat stones.

And in June of last year, two lawyers were fined $5,000 by a U.S. judge after one of them admitted to using ChatGPT to assist in drafting a complaint. The chatbot had inserted fake citations into the submission, referencing non-existent cases.

A good piece of news for lawyers, search engine giants, and airlines: At least some types of AI hallucinations could soon be a thing of the past. New research published Wednesday in the scientific journal Nature describes a new method for detecting AI hallucinations.

The method is able to distinguish between correct and incorrect AI-generated answers in about 79 percent of cases – roughly ten percentage points higher than other leading methods. Although the method addresses only one cause of AI hallucinations and requires about ten times more computational power than a standard chatbot conversation, the results could pave the way for more reliable AI systems.

My goal is to open up ways to use large language models where they are currently not used—where a bit more reliability than currently available is required," says Sebastian Farquhar, one of the authors of the study and Senior Research Fellow at the Department of Computer Science at the University of Oxford, where the research was conducted.

Farquhar is also a researcher on the security team at Google DeepMind. Regarding the lawyer who was penalized due to a ChatGPT hallucination, Farquhar says: "This would have helped him.

The term "hallucination" has gained importance in the world of AI but is also controversial. It implies that models have a kind of subjective world experience, which most computer scientists deny. Additionally, it suggests that hallucinations are a solvable quirk and not a fundamental problem of large language models. Farquhar's team focused on a specific category of hallucinations that they call "confabulations.

This occurs when an AI model gives inconsistent incorrect answers to a factual question, as opposed to consistent incorrect answers, which are more likely due to issues with the model's training data or structural errors in the model's logic.

The method for detecting confabulations is relatively simple. First, the chatbot is asked to give several responses to the same input. Then, the researchers use a different language model to group these responses by their meaning.

The researchers then calculate a metric they call "semantic entropy" - a measure of how similar or different the meanings of the responses are. High semantic entropy indicates that the model is confabulating.

The method for detecting semantic entropy outperformed other approaches for detecting AI hallucinations. Farquhar has some ideas on how semantic entropy could help reduce hallucinations in leading chatbots.

He believes that this could theoretically make it possible to add a button to OpenAI, allowing users to rate the certainty of an answer. The method could also be integrated into other tools that use AI in highly sensitive environments where accuracy is crucial.

While Farquhar is optimistic, some experts warn against overestimating the immediate impact. Arvind Narayanan, Professor of Computer Science at Princeton University, emphasizes the challenges of integrating this research into real-world applications.

He points out that hallucinations are a fundamental problem in the functioning of large language models and that it is unlikely this problem will be fully resolved in the near future.

Own the gold standard ✨ in financial data & analytics
fair value · 20 million securities worldwide · 50 year history · 10 year estimates · leading business news

Subscribe for $2

News