AI Hallucinations Challenge Reliability as Adoption Grows in Law, Healthcare, and Business

Image Credit: Google DeepMind | Splash
Artificial intelligence systems, now widely deployed for generating text and assisting with tasks from legal research to customer support, are facing renewed scrutiny over “hallucinations”—instances where AI produces false or fabricated information, often with high confidence.
Understanding AI Hallucinations
AI hallucinations refer to cases where large language models (LLMs)—the technology behind systems such as ChatGPT and Claude—generate inaccurate or entirely made-up information. Unlike human errors, these mistakes occur because AI models rely on statistical patterns found in training data, not true comprehension. For example, in 2024, an AI chatbot built into the coding tool Cursor incorrectly announced a policy change, misleading users. In another instance, a California copyright lawsuit against Anthropic involved a filing by law firm Latham & Watkins that cited a fabricated footnote, later traced to an AI-generated error. Such cases highlight the tangible risks of AI inaccuracies in real-world settings.
The prevalence of hallucinations became prominent after the introduction of generative AI tools like ChatGPT in late 2022 and remains a persistent challenge for developers and end users alike.
The Roots of AI Errors
Hallucinations primarily stem from technical limitations in how LLMs are designed. These models are trained on vast datasets that may contain errors, omissions, or biases. Their ability to process context is restricted by a fixed context window, sometimes leading to misinterpretation of complex queries. Overfitting, where models memorize patterns rather than generalizing accurately, and attention issues that distort understanding, can further exacerbate errors. Because LLMs generate output by predicting the next most likely word, mistakes are inevitable.
Recent advances in “reasoning” models—AI systems designed to better mimic human logic—have not fully solved the problem. In fact, OpenAI’s April 2025 tests found that GPT-4o hallucinated in 33% of answers on the PersonQA benchmark, compared to 48% for GPT-4o-mini and 44% for the earlier GPT-4-turbo, according to TechCrunch. This data indicates that newer models can exhibit similar or even higher error rates, depending on the task.
AI Errors in Critical Sectors
The impact of hallucinations varies by application. In casual settings, such as social chatbots, the risks are relatively low. But in high-stakes fields like law, medicine, or finance, hallucinations can have significant consequences. Legal professionals have faced court sanctions for submitting filings with AI-generated, non-existent citations, as illustrated by the Latham & Watkins case. In healthcare, inaccurate AI outputs risk misinforming diagnoses, while in business, reliance on incorrect data can erode trust or cause financial losses.
Reflecting the seriousness of these risks, companies like Armilla have launched insurance products designed to cover legal costs and other liabilities resulting from AI-generated errors. The Armilla AI assurance platform, for example, underwrites policies with backing from Lloyd’s of London insurers.
AI’s Benefits vs. Reliability Risks
Despite these challenges, AI continues to deliver substantial benefits by automating routine tasks and enabling rapid content generation. Retailers such as Zalando use generative AI to create marketing imagery, allowing them to keep pace with fast-moving fashion trends. In customer support, AI chatbots efficiently manage high volumes of inquiries, reducing operational costs.
However, hallucinations remain a key drawback. The confident tone of AI outputs can make errors difficult to detect, especially for non-experts. Furthermore, the opaque nature of how LLMs arrive at their answers makes error detection challenging even for developers. As a result, experts widely recommend that human oversight be maintained in critical applications, with AI acting as a tool rather than a replacement for human judgment.
New Challenges and Solutions
Recent findings show that hallucination rates can be higher in some newer AI models. Vectara, an AI reliability startup, reported in 2025 that DeepSeek’s R1 model exhibited a 14.3% hallucination rate on its benchmark, while OpenAI’s GPT-4o model scored 6.8%. OpenAI has stated that factors such as verbosity may contribute to hallucinations, not just model complexity.
Efforts to address the issue are ongoing. Researchers at the University of Oxford have developed techniques to flag AI uncertainty, using measures like semantic entropy to signal when outputs may be unreliable. Vectara has introduced a “guardian agent” system, which cross-references AI-generated content to reduce hallucination rates to below 1% in certain applications. These solutions are part of a broader movement toward multi-step AI workflows and improved error detection.
The Future of AI Accuracy
Experts agree that hallucinations are likely to persist as long as AI relies on probabilistic modeling. Amr Awadallah, CEO of Vectara and a former Google executive, has stated: “Despite our best efforts, they will always hallucinate”. Potential mitigation strategies include refining training datasets, implementing post-generation fact-checking, and improving context awareness within models. Prompt engineering—carefully crafting user queries—can help reduce hallucinations but often requires technical expertise.
Hybrid systems combining AI with human review are gaining adoption as a way to balance efficiency and reliability. However, the growing use of synthetic data—AI-generated material used for training—has raised concerns about amplifying existing errors.
AI’s Growing Role and Trust Issues
AI use continues to expand, with major companies such as Microsoft and Google investing heavily in new infrastructure. The International Energy Agency projects that by 2030, AI data centers could consume as much as 12% of U.S. electricity. However, persistent hallucinations may undermine trust and limit AI’s role in sectors where accuracy is paramount.
Ultimately, building public trust in AI will require substantial progress in addressing hallucinations. Without improved reliability, skepticism could slow adoption, especially in areas where AI offers the greatest potential benefit.