In June 2023, a hearing at the United States District Court for the Southern District of New York became a landmark event in the global history of AI governance. Lawyer Steven Schwartz, in a personal injury lawsuit (Mata v. Avianca, Inc.), used ChatGPT to draft legal documents, citing six case precedents that appeared complete — with case numbers, courts, ruling dates, and legal reasoning — but were entirely fabricated by AI.[1] When Judge P. Kevin Castel asked the lawyer to provide the full text of these cases, Schwartz once again asked ChatGPT to confirm whether the cases were real — ChatGPT "confidently" responded: "Yes, these cases are real and can be found in reputable legal databases." The court ultimately fined Schwartz $5,000 and wrote in its opinion that this represented "the beginning of a bad precedent."[1] This case was not an isolated incident. In 2024, Google's AI Overview feature suggested in search results that users put non-toxic glue on pizza to prevent cheese from sliding off;[2] Air Canada's customer service chatbot fabricated a nonexistent "bereavement discount policy," and the court ultimately ruled that the airline must honor the false promise made by AI.[3] These events collectively point to a systemic risk that has been chronically underestimated in the AI industry: AI hallucination — generative AI producing content that appears plausible but is actually incorrect or entirely fabricated, delivered with a highly confident tone. In my experience leading Meta Intelligence in deploying AI systems for enterprises, and from my prior research on technology governance at the University of Cambridge, I have come to deeply recognize that AI hallucination is not merely a technical problem — it is a governance challenge that demands a systemic response across technical, institutional, and organizational dimensions.

I. The Cognitive Science of AI Hallucination: Why Do Large Language Models "Lie"?

First, a fundamental concept must be clarified: large language models (LLMs) are not "lying" — because lying presupposes intent and cognition, neither of which LLMs possess. More precisely, LLMs are "hallucinating" — the content they generate is not based on an understanding of facts, but on statistical patterns in training data.[4]

From a technical perspective, AI hallucinations can be classified into two major categories. Factuality hallucination refers to the model generating content that contradicts verifiable facts — such as fabricated legal precedents, nonexistent academic papers, or incorrect statistics. Faithfulness hallucination refers to the model's output being inconsistent with its input or existing context — for example, adding information not present in the source document during a summarization task, or altering the meaning of the original text during translation.[4]

The root cause of hallucination lies in the architectural nature of LLMs. Transformer models generate text through "next token prediction" — at each step selecting the most probable next word, where this selection is based on statistical correlation rather than factual correctness.[5] When the model encounters questions insufficiently covered in its training data, it does not say "I don't know" — because its training objective is to generate fluent text, not accurate information. This architectural characteristic means: hallucination is not a bug of LLMs, but a byproduct of their features. The very characteristics that make LLMs excel at creative writing (fluency, coherence, seemingly plausible chains of reasoning) are precisely what cause them to hallucinate.

A deeper issue concerns "calibration." An ideal AI system should express uncertainty when uncertain — that is, its confidence level should match its accuracy rate. However, research shows that current LLMs universally suffer from "overconfidence" — even when generating completely incorrect content, the model's tone remains certain and authoritative.[6] ChatGPT's "confident" confirmation of fabricated cases in Mata v. Avianca is a textbook manifestation of this calibration failure. The severity of this problem lies in its exploitation of human "authority bias" — when information is presented in a certain, professional tone, humans tend to trust it, even when it is wrong.

II. The Cost of Hallucination in High-Risk Domains: Law, Healthcare, and Finance

The severity of harm from AI hallucination is directly proportional to the risk level of its application context. In low-risk scenarios (such as content creation or brainstorming), a certain degree of hallucination can be tolerated or even beneficial. But in high-risk scenarios, hallucination can cause irreversible harm.

The legal domain is a high-risk zone for AI hallucination. Beyond the Mata v. Avianca case, similar incidents have been occurring worldwide. In 2024, a Canadian lawyer was sanctioned by the court for using AI to generate legal documents containing fabricated case precedents.[7] Research shows that when asked to answer legal questions, GPT-4's hallucination rate is approximately 6.2% — seemingly low on the surface, but in the legal context, a 6.2% error rate means that roughly one in every 16 legal responses may contain fabricated legal authority.[8] More dangerously, hallucinations in the legal domain often exhibit a high degree of "credibility camouflage" — AI-generated fake case precedents typically include complete case number formats, plausible court names, and seemingly logical legal reasoning, making it possible for even experienced lawyers to be misled without independent verification.

The healthcare domain presents even more direct risks. Multiple studies have assessed LLM hallucination rates in medical consultations: one analysis of GPT-4 answering medical questions found that approximately 4.2% of responses contained clinically incorrect information; another study covering multiple models showed hallucination rates ranging from 3% to 27%, depending on the complexity of the question and the model version.[9] In oncology, researchers found that ChatGPT, when answering questions related to cancer screening and treatment, gave recommendations inconsistent with current clinical guidelines approximately 12.5% of the time.[10] When AI confidently provides erroneous diagnostic advice — for example, advising a patient that further testing is unnecessary, or recommending an inappropriate treatment plan — the consequences can be fatal.

The financial domain faces equally significant hallucination risks. AI-generated financial analyses may contain fabricated market data, nonexistent research reports, or incorrect financial ratios. Bloomberg's research team found that even models specifically trained on financial data (such as BloombergGPT) still hallucinate when generating specific numerical financial information.[11] In a domain where information timeliness is paramount and decision impacts involve enormous sums, the marginal harm of hallucination far exceeds that in other contexts.

III. An Information Economics Perspective: Hallucination as Market Failure

From the perspective of information economics, AI hallucination can be understood as a special form of "information asymmetry" and "market failure."[12]

Nobel laureate in economics George Akerlof described a mechanism in his classic paper "The Market for Lemons": when buyers cannot distinguish between high-quality and low-quality products, the market experiences "adverse selection" — high-quality products are driven out by low-quality ones.[13] AI hallucination creates a similar dynamic in the information market: when high-quality AI-generated content (accurate analyses, correct facts) and low-quality content (hallucinations, fabricated citations) are indistinguishable in appearance, the entire market for AI-generated content faces a trust crisis.

This analytical framework explains why "better models" alone cannot solve the hallucination problem. Even if next-generation models reduce their hallucination rate from 10% to 1%, as long as users cannot reliably identify which outputs belong to that 1% of hallucinations, the trust problem persists.[14] This is a structural problem requiring structural solutions — not better technology, but better institutions.

Michael Spence's signaling theory provides a useful framework.[15] In the context of AI hallucination, "signals" can take various forms: AI systems' uncertainty quantification of their outputs ("I have 85% confidence in this answer"); verifiability of cited sources (clickable, authentic reference links); third-party compliance certifications (independent certifications after passing hallucination rate testing). The establishment of these signaling mechanisms requires coordinated action among regulators, standards-setting bodies, and industry.

IV. Technical Mitigation Strategies: From RAG to Uncertainty Quantification

Although AI hallucination cannot be completely eliminated, its frequency and impact can be significantly reduced through various technical means.

Retrieval-Augmented Generation (RAG) is currently the most widely adopted hallucination mitigation technique.[16] RAG's core logic is: before an LLM generates an answer, it first retrieves relevant document fragments from an external knowledge base, then provides these fragments as context to the model, making its answers "evidence-based" rather than "generated from thin air." Research shows that compared to pure LLM generation, RAG can reduce factuality hallucination rates by 40% to 70%.[17] However, RAG is not a panacea — if the knowledge base itself contains erroneous information, or the retrieval system fails to find relevant documents, RAG can still produce hallucinations. Furthermore, models sometimes "ignore" the retrieved context and generate answers based on their own parametric knowledge — a phenomenon known as "context neglect."

Chain of Thought (CoT) reasoning and Self-Consistency represent another important set of technical approaches. By requiring the model to show its step-by-step reasoning process before giving a final answer, hallucinations become easier for human reviewers to identify. The self-consistency method requires the model to generate multiple independent answers to the same question, then takes the majority-consistent result — if multiple answers contradict each other, this itself is a warning sign of hallucination.[18]

Uncertainty Quantification attempts to fundamentally address LLMs' "overconfidence" problem. This includes token-level probability calibration (making the model's output probability distribution more accurately reflect its actual accuracy rate), semantic-level uncertainty estimation (assessing the credibility of entire statements rather than individual tokens), and ensemble methods (estimating uncertainty through output divergence across multiple models).[19] Google's Gemini 1.5 and Anthropic's Claude 3.5 have integrated uncertainty expression capabilities to varying degrees — for example, adding qualifiers such as "I'm not entirely sure" or "to my knowledge" when answering knowledge-based questions.

Fact-Verification Pipelines represent a post-processing approach — after the LLM generates output, another system (which can be another AI model or structured knowledge graph queries) automatically verifies the factual claims in the output.[20] This is analogous to a newsroom's fact-checking process — after a journalist writes an article, an independent fact-checking team verifies key claims. Automated fact-verification pipelines can add a layer of quality assurance to AI outputs without sacrificing generation speed.

V. Institutional Design: A Three-Layer Governance Architecture

Technical measures are necessary but insufficient. A comprehensive AI hallucination governance framework requires simultaneous construction across three dimensions: technology, process, and institution.

The technical layer aims to minimize the generation of hallucinations. Specific measures include: deploying RAG to reduce factuality hallucination rates; implementing uncertainty quantification so models express uncertainty when uncertain; establishing fact-verification pipelines to automatically detect factual errors in outputs; and conducting regular hallucination benchmarking to track system hallucination performance across different task types.

The process layer aims to promptly capture hallucinations after they occur. This includes: defining varying degrees of human review standards based on the risk level of the application context. In high-risk scenarios (such as legal documents, medical advice, financial reports), every AI-generated output should be reviewed by personnel with domain expertise; in medium-risk scenarios, sampling reviews can be adopted; in low-risk scenarios, user feedback can be relied upon. Additionally, a "hallucination incident reporting system" should be established — enabling users or reviewers to quickly report hallucinations and trigger an investigation process when they are discovered.[21]

The institutional layer aims to build long-term governance infrastructure. The National Institute of Standards and Technology (NIST) published the AI Risk Management Framework (AI RMF 1.0) in 2023, providing a useful reference architecture.[22] The framework is organized around four core functions: "Govern" — establishing the organizational culture and accountability structure for risk management; "Map" — identifying risk scenarios for AI systems; "Measure" — quantifying and tracking risk indicators; and "Manage" — implementing risk mitigation and response measures. Applying this framework to AI hallucination governance means enterprises need to: clarify accountability for AI hallucination (who is responsible for AI's erroneous outputs?); establish quantitative indicators for hallucination risk (what is the acceptable hallucination rate in different business scenarios?); and develop contingency plans (response procedures when AI hallucination causes actual harm).

The EU AI Act provides an additional legal framework at the institutional level. Under the Act, providers of high-risk AI systems are obligated to ensure "sufficient accuracy, robustness, and cybersecurity."[23] Although the Act does not explicitly mention the word "hallucination," hallucination — as the antithesis of accuracy — undoubtedly falls within the scope of this obligation. This means that for high-risk AI systems deployed in the EU, controlling hallucination rates is not only a best practice but a legal obligation.

VI. Organizational Culture: From "Blind Trust" to "Critical Collaboration"

The effectiveness of a governance framework ultimately depends on the people who use AI systems. No matter how well-designed the technology and institutions are, if organizational members lack critical thinking about AI outputs, hallucinations will still cause harm.

Research shows that humans exhibit two biases when interacting with AI: "automation bias" — over-trusting the outputs of automated systems even in the presence of contradictory evidence; and "algorithm aversion" — completely refusing to use AI after having encountered AI errors.[24] Neither of these is a constructive attitude. The ideal state is "calibrated trust" — where the level of trust in AI matches its actual reliability on specific tasks.

Building calibrated trust requires systematic effort at the organizational level. First, AI literacy training — enabling employees to understand how LLMs work, their inherent limitations, and common patterns of hallucination. When employees understand that "AI is not answering your question, it is predicting the most probable next token," they will naturally maintain a more appropriate level of skepticism toward AI outputs. Second, establishing a "verification culture" — treating the independent verification of AI outputs as a demonstration of professional competence rather than a loss of efficiency. In high-risk domains, "what AI said" should never be the final answer — it should be "what AI suggested, which parts I verified, and my conclusion is..."

As I discussed the concept of "cognitive debt" in my article on Vibe Coding and the Software Engineering Crisis, the organizational risk of AI hallucination lies not only in erroneous outputs themselves but also in the gradual degradation of human professional judgment resulting from long-term AI dependence. If professional workers habitually accept AI outputs without independent thinking, their very ability to identify hallucinations will progressively erode — this is a self-reinforcing vicious cycle.

VII. Conclusion: The Wisdom of Coexisting with Imperfection

AI hallucination will not be completely eliminated — just as human cognitive biases will not be completely eliminated. The value of generative AI lies in its powerful pattern recognition, language generation, and knowledge integration capabilities; hallucination is an inherent byproduct of these capabilities. Pursuing "zero-hallucination" AI is not only technically unrealistic but may be conceptually self-contradictory — because the very characteristics that enable LLMs to produce creative outputs are precisely what cause them to hallucinate.

Therefore, the right question is not "how to eliminate AI hallucination," but "how to build a safe, reliable, and trustworthy AI usage system given the persistent existence of AI hallucination." The answer to this question does not lie in technological breakthroughs — technology can only reduce, not eliminate, hallucination — but in the refinement of governance frameworks: clear accountability, appropriate risk stratification, effective human-AI collaboration processes, and continuous organizational learning.

From a broader perspective, the governance of AI hallucination is a microcosm of building "trustworthy technology" in the AI era. Like other aspects of enterprise AI governance, hallucination governance requires cross-disciplinary collaboration among technical teams, legal compliance, risk management, and business units. No single department or single technical solution can solve this problem — what is needed is systems thinking and institutional responses. And in this process, perhaps the most critical cognitive shift is: accepting AI's imperfection is not a failure — it is the starting point for responsible AI use.

References

  1. Mata v. Avianca, Inc. (2023). No. 22-cv-1461 (PKC), Order and Opinion. United States District Court, Southern District of New York. justia.com
  2. Verge, The. (2024). Google's AI told users to put glue on pizza. theverge.com
  3. Moffatt v. Air Canada (2024). Civil Resolution Tribunal, British Columbia. canlii.org
  4. Ji, Z. et al. (2023). Survey of Hallucination in Natural Language Generation. ACM Computing Surveys, 55(12), 1–38. doi.org
  5. Vaswani, A. et al. (2017). Attention Is All You Need. Advances in Neural Information Processing Systems, 30. arXiv:1706.03762
  6. Kadavath, S. et al. (2022). Language Models (Mostly) Know What They Know. arXiv:2207.05221
  7. Canadian Broadcasting Corporation. (2024). B.C. lawyer sanctioned for using AI-generated fake cases. cbc.ca
  8. Dahl, M. et al. (2024). Large Legal Fictions: Profiling Legal Hallucinations in Large Language Models. Journal of Legal Analysis, 16(1). doi.org
  9. Umapathi, L. K. et al. (2023). Large Language Models in Medical Question Answering: A Systematic Evaluation of Hallucination. arXiv:2309.05922
  10. Chen, S. et al. (2023). Evaluation of ChatGPT in Answering Cancer Screening and Treatment Questions. JAMA Oncology. jamanetwork.com
  11. Wu, S. et al. (2023). BloombergGPT: A Large Language Model for Finance. arXiv:2303.17564
  12. Akerlof, G. A. (1970). The Market for 'Lemons': Quality Uncertainty and the Market Mechanism. The Quarterly Journal of Economics, 84(3), 488–500. doi.org
  13. Akerlof, G. A. (1970). Ibid.
  14. Huang, L. et al. (2023). A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions. arXiv:2311.05232
  15. Spence, M. (1973). Job Market Signaling. The Quarterly Journal of Economics, 87(3), 355–374. doi.org
  16. Lewis, P. et al. (2020). Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. Advances in Neural Information Processing Systems, 33. arXiv:2005.11401
  17. Shuster, K. et al. (2021). Retrieval Augmentation Reduces Hallucination in Conversation. Findings of EMNLP. arXiv:2104.07567
  18. Wang, X. et al. (2022). Self-Consistency Improves Chain of Thought Reasoning in Language Models. arXiv:2203.11171
  19. Gal, Y. & Ghahramani, Z. (2016). Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning. ICML. arXiv:1506.02142
  20. Min, S. et al. (2023). FActScore: Fine-grained Atomic Evaluation of Factual Precision in Long Form Text Generation. arXiv:2305.14251
  21. ISO/IEC 42001:2023. Information technology — Artificial intelligence — Management system. iso.org
  22. National Institute of Standards and Technology. (2023). Artificial Intelligence Risk Management Framework (AI RMF 1.0). NIST AI 100-1. nist.gov
  23. European Parliament and Council. (2024). Regulation (EU) 2024/1689 — Artificial Intelligence Act, Article 15. eur-lex.europa.eu
  24. Parasuraman, R. & Riley, V. (1997). Humans and Automation: Use, Misuse, Disuse, Abuse. Human Factors, 39(2), 230–253. doi.org
Back to Insights