GPT-5, as per OpenAI, exhibits reduced hallucinations – a scrutiny of the data
OpenAI's new AI model, GPT-5, has been launched, and it's making waves for its significant reduction in hallucinations compared to previous models. This advancement is attributed to GPT-5's improved reasoning capabilities, instruction following, and tool intelligence, leading to more reliable and accurate outputs.
In the Simple QA evaluation, a test measuring model accuracy for answered questions without web access, GPT-5's hallucination rates were significantly lower than its predecessors. Specifically, GPT-5's main version had a hallucination rate of 47 percent, while the thinking version recorded a rate of 40 percent. This is a marked improvement over GPT-4o's 52 percent and o3's 46 percent.
When it comes to factual benchmarks without tool use, GPT-5 shows impressive results. On the LongFact-Concepts and LongFact-Objects tests, GPT-5's hallucination rates were 1.0 percent and 1.2 percent respectively, compared to o3's 5.2 percent and 6.8 percent, and GPT-4o's 3.0 percent and 8.9 percent. On the FactScore test, GPT-5 reduced the hallucination rate to 2.8 percent, far lower than o3’s 23.5 percent and GPT-4o’s 38.7 percent.
In chat-based testing with "thinking" enabled, GPT-5's hallucination rate drops to a mere 4.8 percent, a significant decrease from o3's 22 percent and GPT-4o's 20.6 percent.
In simulated real-world task performance tests, GPT-5 scores 63.5 percent in airline website navigation, slightly below o3’s 64.8 percent, and 81.1 percent in retail website navigation, just under Claude Opus 4.1’s 82.4 percent.
OpenAI attributes these improvements to investments in model training, web search integration during evaluation, better reasoning on complex questions, and safety research minimizing deception and improving response honesty. GPT-5 is noted for being safer, more transparent, and better at discerning misuse attempts while reducing hallucinations across multiple domains and evaluation settings.
However, it's important to note that users without web search will encounter much higher risks of hallucination and inaccuracies when using GPT-5 without web access. If using ChatGPT for something important, it's recommended to ensure it has web access to minimize the risks of hallucination and inaccuracies.
In the demo of GPT-5 explaining how planes work, Beth Barnes, founder and CEO of AI research nonprofit METR, spotted an inaccuracy. GPT-5's interpretation of the Bernoulli Effect was found to be wrong.
Overall, GPT-5 represents a 4- to 6-fold reduction in hallucination rates compared to its predecessors across various benchmarks and practical usage scenarios, establishing a new level of factual reliability for large language models.
- Google's search engine was integrated during the evaluation of GPT-5, aiding in its performance by providing factual data.
- In the tech world, the launch of GPT-5, OpenAI's newest AI model, has caused quite a stir due to its impressive reduction in hallucinations.
- Twitter discussions about GPT-5 focus on its improved accuracy and reduced hallucination rates compared to previous models, with the Android community also expressing interest in potential application of these advancements.
- Artificial-intelligence experts are praising GPT-5's performance across various benchmarks, with its breakthrough in reducing hallucinations being a key highlight in the evolution of artificial intelligence technology.