Unveil the Latest Gadgets — Introducing Cutting-Edge AI Technology

GPT-5, as per OpenAI, exhibits reduced hallucinations – a scrutiny of the data

AI developer OpenAI announces GPT-5, boasting reduced hallucination occurrences. Comparatively, how do the hallucination metrics of GPT-5 fare against earlier models?

, and Administrator

2025 August 17 . 6:44 PM

2 min read

GPT-5 exhibits reduced hallucinations, according to OpenAI - is that backed up by the data?

GPT-5, as per OpenAI, exhibits reduced hallucinations – a scrutiny of the data

OpenAI's new AI model, GPT-5, has been launched, and it's making waves for its significant reduction in hallucinations compared to previous models. This advancement is attributed to GPT-5's improved reasoning capabilities, instruction following, and tool intelligence, leading to more reliable and accurate outputs.

In the Simple QA evaluation, a test measuring model accuracy for answered questions without web access, GPT-5's hallucination rates were significantly lower than its predecessors. Specifically, GPT-5's main version had a hallucination rate of 47 percent, while the thinking version recorded a rate of 40 percent. This is a marked improvement over GPT-4o's 52 percent and o3's 46 percent.

When it comes to factual benchmarks without tool use, GPT-5 shows impressive results. On the LongFact-Concepts and LongFact-Objects tests, GPT-5's hallucination rates were 1.0 percent and 1.2 percent respectively, compared to o3's 5.2 percent and 6.8 percent, and GPT-4o's 3.0 percent and 8.9 percent. On the FactScore test, GPT-5 reduced the hallucination rate to 2.8 percent, far lower than o3’s 23.5 percent and GPT-4o’s 38.7 percent.

In chat-based testing with "thinking" enabled, GPT-5's hallucination rate drops to a mere 4.8 percent, a significant decrease from o3's 22 percent and GPT-4o's 20.6 percent.

In simulated real-world task performance tests, GPT-5 scores 63.5 percent in airline website navigation, slightly below o3’s 64.8 percent, and 81.1 percent in retail website navigation, just under Claude Opus 4.1’s 82.4 percent.

OpenAI attributes these improvements to investments in model training, web search integration during evaluation, better reasoning on complex questions, and safety research minimizing deception and improving response honesty. GPT-5 is noted for being safer, more transparent, and better at discerning misuse attempts while reducing hallucinations across multiple domains and evaluation settings.

However, it's important to note that users without web search will encounter much higher risks of hallucination and inaccuracies when using GPT-5 without web access. If using ChatGPT for something important, it's recommended to ensure it has web access to minimize the risks of hallucination and inaccuracies.

In the demo of GPT-5 explaining how planes work, Beth Barnes, founder and CEO of AI research nonprofit METR, spotted an inaccuracy. GPT-5's interpretation of the Bernoulli Effect was found to be wrong.

Overall, GPT-5 represents a 4- to 6-fold reduction in hallucination rates compared to its predecessors across various benchmarks and practical usage scenarios, establishing a new level of factual reliability for large language models.

Google's search engine was integrated during the evaluation of GPT-5, aiding in its performance by providing factual data.
In the tech world, the launch of GPT-5, OpenAI's newest AI model, has caused quite a stir due to its impressive reduction in hallucinations.
Twitter discussions about GPT-5 focus on its improved accuracy and reduced hallucination rates compared to previous models, with the Android community also expressing interest in potential application of these advancements.
Artificial-intelligence experts are praising GPT-5's performance across various benchmarks, with its breakthrough in reducing hallucinations being a key highlight in the evolution of artificial intelligence technology.

Latest

In this image there is a building with clock on it, also there are some trees and electrical pole...

Industry

EnBW Installs 100,000 Smart Meters in 2023 as Mandatory Rollout Begins

Mandatory smart meter installations begin in 2023. EnBW leads the way with 100,000 new meters this year, offering consumers better control and potential variable tariffs.

, and Administrator

2025 October 9

In the image we can see there is a chef standing and there are juice glasses kept on the table....

Smart-home-devices

Ninja Slushi Machine Discounted to €255 on Amazon Prime Day

Upgrade your parties with the Ninja Slushi. Enjoy frozen drinks at a discounted price during Amazon's Prime Day.

, and Administrator

2025 October 9

This image is taken from the top, where we can see the city which includes, towers, buildings,...

Geek Gadgetry's Cloud Computing Hub

Snyk Opens Sydney Data Center to Meet Asia-Pacific Data Residency Needs

Snyk's new data center in Sydney ensures local data processing for customers like Australia Post and Atlassian, addressing growing data residency concerns in the cloud era.

, and Administrator

2025 October 9

This image consists of few persons. They are wearing the army dresses. At the bottom, there is...

Smart-home-devices

Free E-bike/Pedelec Training Sessions in Wesel this October

Boost your E-bike skills and ensure your Pedelec is legal. Free sessions happening near you this October.

, and Administrator

2025 October 9

GPT-5, as per OpenAI, exhibits reduced hallucinations – a scrutiny of the data

GPT-5, as per OpenAI, exhibits reduced hallucinations – a scrutiny of the data

Read also:

Related

Latest